A Comprehensive, Valid, and Reliable Tool to Assess the Degree of Responsibility of Digital Health Solutions That Operate With or Without Artificial Intelligence: 3-Phase Mixed Methods Study

Background Clinicians’ scope of responsibilities is being steadily transformed by digital health solutions that operate with or without artificial intelligence (DAI solutions). Most tools developed to foster ethical practices lack rigor and do not concurrently capture the health, social, economic, and environmental issues that such solutions raise. Objective To support clinical leadership in this field, we aimed to develop a comprehensive, valid, and reliable tool that measures the responsibility of DAI solutions by adapting the multidimensional and already validated Responsible Innovation in Health Tool. Methods We conducted a 3-phase mixed methods study. Relying on a scoping review of available tools, phase 1 (concept mapping) led to a preliminary version of the Responsible DAI solutions Assessment Tool. In phase 2, an international 2-round e-Delphi expert panel rated on a 5-level scale the importance, clarity, and appropriateness of the tool’s components. In phase 3, a total of 2 raters independently applied the revised tool to a sample of DAI solutions (n=25), interrater reliability was measured, and final minor changes were made to the tool. Results The mapping process identified a comprehensive set of responsibility premises, screening criteria, and assessment attributes specific to DAI solutions. e-Delphi experts critically assessed these new components and provided comments to increase content validity (n=293), and after round 2, consensus was reached on 85% (22/26) of the items surveyed. Interrater agreement was substantial for a subcriterion and almost perfect for all other criteria and assessment attributes. Conclusions The Responsible DAI solutions Assessment Tool offers a comprehensive, valid, and reliable means of assessing the degree of responsibility of DAI solutions in health. As regulation remains limited, this forward-looking tool has the potential to change practice toward more equitable as well as economically and environmentally sustainable digital health care.


A. 1. The concept mapping process leading to the first version of the Tool
The objective of Phase 1 was to identify principles and best practices specific to D/AI solutions that could help us adapt the RIH Assessment Tool into the Responsible D/AI Solutions Assessment Tool. To do so, we followed a concept mapping process that relied on a scoping review our team published recently [17]. Concept mapping refers to a "structured process" that gathers "input from multiple participants" and uses qualitative as well as quantitative analyses to produce an exhaustive map of a conceptual domain [23]. To map "as completely as possible all of the key facets" [23] of responsibility in D/AI health solutions (i.e., the conceptual domain of interest in our study) and the relationships between key constructs, we followed the following three-step process: 1. Generating the conceptual domain; 2. Structuring the conceptual domain; 3. Representing the conceptual domain.
Before describing these steps, we explain below why the RIH Tool was used as the backbone to develop the Responsible D/AI Solutions Assessment Tool.

Rationale for adapting the RIH Tool
The RIH Tool is an evidence-informed tool that measures the degree of responsibility of a given health innovation along 9 attributes that capture product-, process-, and organization-level characteristics of responsibility [23]. It comprises 4 premises, 4 inclusion and exclusion criteria, 9 responsibility attributes with individual four-level Likert-like rating scales, and a scoring system that takes the quality of the sources of information used in the assessment into account. Construct validity and reliability were respectively established through a e-Delphi study conducted with 4 groups of international experts (health technology assessment, bioethics, biomedical engineering, and Responsible Research and Innovation) [22] and an interrater reliability assessment study [23]. To the best of our knowledge, this is one of the rare tools in the field of Responsible Research and Innovation (RRI) that is specific to the health sector and that establishes a quantitative measure of the degree of responsibility of a health innovation [21].
Though the RIH Tool is applicable to health innovations that contain D/AI components (e.g., data) [21], it does not capture specific responsibility issues that are raised by these components (e.g., data management). Since the development of the RIH framework [19] and its accompanying RIH Tool (within a seven-year research program funded by the Canadian Institutes of Health Research: 2015-2023), the field of digital and AI ethics has exponentially grown [37,45,46]. Ground-breaking advocacy, scholarly, and policy work has brought to light the numerous ethical and responsibility concerns that arise with the development and use of D/AI solutions, both in the health field and across other sectors. For instance, experts have made significant headway elucidating data management issues (e.g., biased datasets), the environmental footprint of D/AI solution development and use (e.g., data centres), AI governance mechanisms for organizations and businesses, and best practices for D/AI engineering professions [3,12,39,15].
Because the state of the literature on responsible D/AI solutions now forms a solid body of knowledge and because the RIH Tool is scientifically sound in its structure and content, we aimed to adapt the RIH Tool to the specificities of D/AI solutions in health. Through a three-step adaptation process, we ensured that our adaptations fit, both at a conceptual and a practical level, with the rationale, objective, and approach of the RIH Tool as well as with the realities of the rapidly evolving D/AI industry.
Step 1: Generating the conceptual domain: What key principles and best practices characterize responsible D/AI solutions?
To identify key D/AI responsibility principles and how they should be operationalized (through questions, recommendations, criteria, 'dos and don'ts,' etc.), we relied on our scoping review of practice-oriented tools that aim to guide the implementation of responsibility principles throughout the lifecycle of D/AI solutions [17]. Because many ethical principles (e.g., privacy, accountability, robustness, beneficence [3,40,45]) have been proposed, either specifically for health care [13] or for multiple sectors [14], and either for digital solutions [15] or for AI [16,39], our search strategies were multidisciplinary and included the grey literature.
We searched 6 academic databases (PubMed, Web of Science, arXiv, Institute of Electrical and Electronics Engineers Xplore, IBSS Proquest Abstracts, Sociological Abstracts), 3 grey literature databases (OpenGrey, CMA CPG Infobase, Government of Canada Publications), and used 2 search engines (Advanced Google Search, DuckDuckGo) using key terms related to ethical AI (e.g., fairness, explainability), digital solutions (e.g., eHealth, mHealth), and RIH attributes (e.g., inequalities, environmental sustainability, inclusiveness, frugality). We included 1) practice-oriented tools defined as frameworks and/or sets of principles with clear explanations on how to apply them in practice; 2) applicable to solution design, development, assessment, use, or audit; 3) developed by academics, governments, NGOs, and/or reputable private sector sources; and 4) comprised of dimensions, criteria, and/or scales to reflect, judge, make a decision, identify, or report a responsibility challenge. We excluded tools that were 1) highly technical (e.g., code on Github); 2) highly specific (e.g., a cybersecurity measure); 3) very extensive (e.g., a fully detailed public procurement program); or 4) lacking in substance (e.g., a short blog post).
Our dataset comprised 56 tools stemming from the academic (n=12) and grey literature (n=44) and developed either specifically for the health field (n=19) or for generic applicability (n=37). Following a qualitative thematic analysis strategy [34], LR and RRO applied 40 codes, which are listed below, to identify the responsibility principles present in these tools. These codes as well as the definitions and practical recommendations provided by the tools' authors constituted the conceptual material to be structured in Step 2. Step 2: Structuring the conceptual domain: What are the relationships between the principles and best practices aiming to foster responsible D/AI solutions?
To structure this conceptual material, we conducted different statistical analyses to define and visualize the relationships between the 40 principles. These quantitative analyses and their findings are reported in detail in our scoping review [17]. The descriptive analyses first examined the distribution of the principles across the 56 tools, which shed light on the responsibility constructs they prioritize or disregard. For instance, 'environmental sustainability' was not found in any of the health-specific tools (n=19) and 50% or more of all tools disregarded 21 principles. To get a better grasp of the key blind spots and normative preferences of the tool creators, we stratified the analyses along three subsets of tools -those from academia (n=15), governments (n=18), and the business sector (n=8)-and performed a network analysis. Table 1 indicates the results of these analyses, which provide a ranking order where more than one principle can occupy the same position. This facilitated a systematic comparison of the responsibility constructs these tools primarily sought to operationalize. In a network analysis, the prominence different tools give to certain principles can be revealed by looking at the connection patterns ('links') between different tools by identifying the principles ('nodes') that cooccur in the tools (i.e., 'Principle A' is linked to 'Tool 1' when the latter relies on that principle) [35]. We calculated the degree of centrality of the principles through a normalized inDegree, which represents the proportion of connections that a principle has compared to all possible connections it may have with the other tools in the network. It provides an indication of the relative importance of a principle within a subset of tools as it measures the extent to which a given principle is connected to other tools. The higher the inDegree is, the more influential the principle is within the network. Results from Table 1 thus facilitated a systematic comparison of the responsibility constructs these tools primarily sought to operationalize.
Step 3: Representing the conceptual domain: What adaptations are required and where shall they be introduced?
To develop the 1 st version of the Responsible D/AI Solutions Assessment Tool, we followed an iterative and deliberative process led by PL. We began by mapping all the 40 principles to each key component of the RIH Tool (see Table 2, page 7): • Its 4 premises, defining how responsibility is approached; • Its 4 screening criteria, defining baseline requirements that must be met for an innovation to be eligible for assessment (i.e., to be considered potentially responsible); and • Its 9 assessment attributes, which use a four-level scale to measure the degree to which a given responsibility characteristic is present.

Business model (original RIH attribute) Business model
Refers to the components through which an organization creates, delivers and captures social and economic value. A business model typically entails a tension between the redistribution of financial returns to shareholders and the provision of a high-quality innovation.
The business model of organizations that seek to provide more value to users, purchasers and society may possess the following characteristics: • Pursue a social and/or environmental mission, operate on a not-for-profit basis or reinvest the majority of the revenues in their mission (e.g., social enterprises) • Make the innovation freely usable or exploitable by others (i.e., open source, product licensing waivers, do-it-yourself) • Adopt a pricing scheme based on ability to pay or a redistributive logic (e.g., customers who "buy one, give one") • Employ people with particular needs (e.g., low literacy, disabilities) • Comply with social responsibility programs (e.g., Certified B Corporation, SA8000 standard for decent work, ISO26000 for social responsibility) Refers to the components through which an organization creates, delivers and captures social and economic value. A business model typically entails a tension between the redistribution of financial returns to shareholders and the provision of a high-quality D/AI solution.
The business model of organizations that seek to provide more value to users, purchasers and society may possess the following characteristics: • Pursue a social and/or environmental mission, operate on a not-for-profit basis or reinvest the majority of the revenues in their mission (e.g., social enterprises); • Make the solution and its hardware components (see Scope of assessment) freely usable or exploitable by others (i.e., open source, product licensing waivers, do-ityourself); • Adopt a pricing scheme based on ability to pay or a redistributive logic (e.g., fees modulated according to user segments); • Employ people with particular needs (e.g., low literacy, disabilities); • Comply with social responsibility programs (e.g., Certified B Corporation, SA8000 standard for decent work, ISO26000 for social responsibility).

The business model of the organization that produces the innovation possesses…
Three of the characteristics described or more vs none The business model of the organization that makes the solution available to end users possesses… Note: Beneficence; Org. Responsibility; Governance; Humancentric

Data governance (Proposed additional attribute for new Tool)
Refers to the stewardship, structures and processes the organization sets in place to ensure full control over the entire lifecycle of the data it gathers, exploits, generates, stores and/or shares with users and third-parties (voluntarily or not). From data collection to data destruction, the organization and its leaders must remain transparent about, accountable for, and swiftly responsive to any breaches in data protection, and to any other issues affecting the D/AI solution data's lifecycle.
Organizations producing responsible D/AI solutions make their high-level executives and employees knowledgeable about and able to report to external auditors on the sensitivity and scope of use of all datasets linked to their solutions. Mechanisms to achieve responsible data governance include: • Fully active oversight committees whose members' conflicts of interest are publicly declared; • Explicit compliance to the laws and regulatory frameworks where users are located; • Adherence to industry standards specific to D/AI solutions (e.g., ISO 13482:2014 for safety of personal care robots, ISO/TS 82304-2 for quality and reliability of health and wellness apps); • Training programs and certification system for in-house data stewards; • Fully functional and active reporting systems; Data governance of the organization that makes the solution available to end users possesses… (scale to be developed after the 1 st construct validity step) Note: Quality assurance; Accountability; Data management; Privacy; Responsiveness; Transparency; Org. responsibility

. Sources used to identify four groups of experts for the e-Delphi panel
The objective of Phase 2 was to ensure the Tool's content validity. To this end, experts from four broad disciplinary domains with high-quality publications in the D/AI solutions field were targeted: 1) health sciences and public health; 2) engineering, design, natural sciences, mathematics, statistics, and operational research; 3) social sciences and humanities; and 4) business, public administration, management, law, and accounting.  Table 3. Overview of the academic journals in which experts published Note: "Others" included 74 journals with 5 experts or less.
We first considered the authors of the 302 publications assessed for eligibility in our scoping review because it covered all key dimensions of the Tool (population health, health system, economic, organizational, and environmental issues). It included 146 peer-reviewed academic articles and 156 gray literature documents (see PRISMA-ScR flow chart in the article). These publications were screened again by RRO, LR, and PL to identify those that aligned with the scope of the e-Delphi exercise. We excluded publications that were highly technical or lacking in substance. We retained a total of 177 publications: 140 academic articles (n=600 experts) and 37 grey literature documents (n=184 experts). Table 3 lists the most frequent academic journals in which these 600 experts' work has been published.
We then searched for and identified the emails of 755 authors over the total of 784 authors. We searched the Internet when e-mails were not available in the publications and included all co-authors. In Round 1, to these 755 experts, we added purposefully sampled experts (n=44) who had been identified through our research collaborators' networks. In Round 2, two experts were recommended by R1 study participants.
The aim was to have a balanced representation of the 4 disciplinary fields.

Section 1 -Introduction to the assessment tool under development
The aim of the tool that is under development is to assess the degree of responsibility of digital solutions in health and social care that operate with or without Artificial Intelligence (AI). We use the term "D/AI solutions" to refer to both types of solution, which are defined as follows: • Digital solution: an electronic system, including both hardware and software, that can generate, store and/or process data. o For example, a virtual reality system can provide patients who suffer from mental health problems with a series of therapeutic sessions where they virtually interact with various avatars, objects, animals and/or environments. • AI solution: an algorithmic system that can infer patterns and draw conclusions from data without explicit programming.
o For example, an AI solution can be trained to process electrocardiogram (ECG) recordings to detect the presence of cardiac problems and provide a medical interpretation.
What kind of tool are we developing? Who will apply the tool? Who will use its results?
The tool under development is inspired by our work on Responsible Innovation in Health (RIH), which led to the RIH Assessment Tool. The preliminary version of the new tool you are kindly asked to appraise follows from a structured review of the scientific and grey literature conducted between December 2020 and May 2021. We analyzed a corpus of 56 practice-oriented frameworks and tools, which yielded up to 40 principles used to approach responsibility in D/AI solutions (paper in preparation). We adapted and integrated to the RIH Assessment Tool the principles that were the most conducive to the measurement of the degree of responsibility of D/AI solutions in health and social care.
The tool is structured around the RIH conceptual framework and follows the logic of the RIH Tool. It is a normative tool that will provide a quantitative measure of the degree of responsibility of D/AI solutions. The tool entails an evidence-informed assessment process and will be applied by people who possess research skills and are able to search for, retrieve and critically read the scientific literature. It will be freely usable with proper academic citation.
The results generated by the tool are meant to inform the decisions of D/AI solution developers and of those who influence the supply side such as entrepreneurs, investors, research funders, incubators, accelerators, etc. The results will also inform the demand side, including individuals who purchase, prescribe, implement or use D/AI solutions such as patients, clinicians, health and social care managers, etc.

What falls within the scope of the assessment of a D/AI solution's degree of responsibility?
Because D/AI solutions typically rely on a wide-ranging network of digital devices and infrastructures, we set two conditions to determine what components should be included in the assessment: 1) their 'raison d'être' is to support the D/AI solution; AND 2) they are part of the minimal requirements for the D/AI solution to deliver its service.
• For instance, a portable finger sensor enabling patients to make the ECG recordings that an AI solution uses to detect cardiac problems meets these two criteria: o its raison d'être lies with the AI solution because it fulfills no other purposes; and o it is a minimal requirement because the AI solution cannot detect cardiac problems without it.
• In contrast, the smartphone or tablet the patient uses to access the same ECG-based AI solution falls outside the scope of the assessment because one condition is not met: o the raison d'être of a smartphone or tablet is not to support this particular AI solution.
Overview of the components of the tool that are appraised in Round 1 of the e-Delphi survey Figure 1 provides an overview of the tool's key components and highlights which components you are kindly asked to appraise and comment in Round 1 of this e-Delphi survey and which components will be part of Round 2.

Figure 1. Flow chart of the tool's assessment components
• Located at the top of the figure, the tool's premises are meant to clarify how responsibility in D/AI solutions should be approached. o Your input on these premises is optional in Round 1. The application of the tool will entail a three-step process: • The screening step aims to determine whether a D/AI solution may potentially qualify as a responsible solution through nine inclusion and exclusion criteria. o We need your input on two of these criteria in Round 1. o The criteria from the original RIH Assessment Tool that have not undergone any changes do not need to be assessed because their validity and reliability are already established. • The assessment step ascertains the presence of responsibility features through twelve attributes organized into five value domains. All attributes will be assessed through a four-level Likert-like scale, ranging from A to D, where A implies a high degree of responsibility and D implies no particular signs of responsibility. o We need your input on five of these attributes in Round 1. o We will develop the scales following your input and you will be kindly asked to appraise them in Round 2. o The attributes from the original RIH Assessment Tool do not need to be appraised. • The rating step will determine the outcomes of the assessment with the help of a scoring system that takes the availability and the quality of the sources of information used to score each attribute into consideration. o The scoring system will be appraised in Round 2. When completing the e-Delphi survey, please keep in mind that the tool should be: • Consistent with current knowledge o Charity, not-for-profit organization, non-governmental organization (NGO) or multilateral body (e.g., WHO, United Nations) o For-profit organization, professional consultant firm or privately funded research institution o Government or arm's length public administration agency (e.g., assessment, regulation, standards and norms, procurement, etc.) o Higher education (e.g., college, university) or publicly funded research institution o Health care facility (e.g., hospital, outpatient clinic, community health service, etc.) o Other. Please specify [free text]: To describe the diversity of the whole study participant sample, we ask one question pertaining to gender identity, one question about geographical location, and one question about years of experience. These data will be aggregated and will not be used for other purposes.

Q 2.4 To which gender do you identify?
o Gender-diverse (including but not limited to nonbinary, gender-fluid, gender non-conforming)

Section 3 -Premises of the tool under development (optional)
This section contains nine questions that are optional. You may wish to read the content before proceeding to the next section of the survey.
Recognizing the broader digital sociotechnical ecosystem in which D/AI solutions evolve, four premises clarify how this assessment tool approaches responsibility. You are invited to rate on a five-level scale their importance and clarity.

Responsibility is linked to the context of use
The overall responsibility of a given D/AI solution is intimately linked to how and where it is used. While a D/AI solution easily crosses geographic boundaries, it is more difficult for developers to know this context well. Nonetheless, the tool should be applied in view of the social, cultural, economic and political characteristics of the context where the intended users are located.

Responsibility means aiming for collective benefits
Although a D/AI solution that provides individual health benefits is valuable, a responsible D/AI solution should seek to increase our ability to attend to collective needs and challenges. This may imply, for instance, addressing the root causes of a health or social care problem rather than simply establishing an individual risk level.

AI for Good is not automatically responsible
Several D/AI solutions are being developed with the explicit intent to 'do good', that is, to alleviate social problems and/or contribute to major societal challenges such as the United Nations Sustainable Development Goals (SDGs). Though AI for Good (AI4Good) solutions may generate positive impacts, one should not presume that they are automatically responsible. We thus suggest applying the tool before concluding whether a given AI4Good solution is responsible or not.

Digital literacies and Internet connectivity are "super-determinants" of health
There is growing evidence that the capabilities to access, use and benefit from digital tools and systems, both at the individual-and group-level, are to be considered "super-determinants of health" because they affect 'upstream' other known determinants of health such as education, housing or employment. Access to the latter increasingly unfold through online transactions, thereby requiring various digital literacies, proper Internet connectivity, an affordable data plan, and low-cost devices that can run recent software releases.
Hence, from a health equity perspective, what is at stake for a D/AI solution in health and social care is not only knowing how to use it, but also having the broader digital capabilities (e.g., skills and competence) and capacities (e.g., means and resources) to materialize its likely benefits. This tool thus recognizes that most D/AI solutions are likely to increase health inequalities unless universal access to bandwidth is achieved and everyone is equipped and supported to become digitally literate.

Section 4 -The Screening step inclusion criteria
This section contains three questions. You are invited to rate on a five-level scale the applicability of two inclusion criteria and suggest additional criteria.
The Screening step relies on three inclusion criteria that are meant to swiftly identify solutions that: 1) meet the Digital or AI-based solution definitions; 2) effectively and safely address at least one determinant of health; and 3) explain the relevance of digitalization.

Digital or AI-based solution
An AI solution is an algorithmic system that can infer patterns and draw conclusions from data without explicit programming.
A digital solution is an electronic system that can generate, store, and/or process data and may include hardware.
Does the solution meet one of the two definitions above? The following inclusion criteria is shown for your information and does not need to be assessed. It is part of the original RIH Assessment Tool and was already validated.

Determinants of health
Refer to the factors inside and outside the health system that determine health across one's life course, which include: •

.3 Relevance of digitalization
Digitalization is a relatively recent technological trend where promises abound. Yet, because of the systemic nature of D/AI solutions and the lucrative industry that has emerged around data exploitation, not all D/AI solutions are relevant in and of themselves (e.g., using face recognition to admit a patient to a health or social care facility) and some may increase the overall burden of care. The decision to turn a non-digital means into a new D/AI solution should thus substantially improve current means of fulfilling such purpose and the relevance of a D/AI solution should be clearly explained.

Section 5 -The Screening step exclusion criteria
This section contains three questions. You are invited to rate on a five-level scale the applicability of two exclusion criteria and suggest additional criteria.
The tool aims to establish a degree of responsibility rather than measure 'irresponsibility.' The Screening step thus relies on three exclusion criteria meant to exclude from the assessment solutions that: 1) have not reached the General Availability (GA) stage; 2) are produced by an organization involved in irresponsible corporate actions; or 3) do not meet minimal responsibility requirements in the D/AI industry.

Exclusion criteria 5.1 General Availability stage not reached
Because a D/AI solution evolves rapidly, its degree of responsibility can be established more meaningfully at the General Availability (GA) stage. While at the Release to Manufacturing (RTM) stage a D/AI solution is of sufficient quality for mass distribution, GA refers to a point where necessary commercialization activities including security and compliance tests have been completed. When a D/AI solution has not reached the GA stage, we recommend postponing its assessment.
Has the solution reached GA stage in the region where users concerned by this assessment are located? The following exclusion criteria is part of the original RIH Assessment Tool and does not need to be assessed.

Corporate Social Irresponsibility
Refers to legal or illegal corporate actions that can harm people, animals or the environment. Examples of such actions by organizations producing D/AI solutions are often linked to the lack of, or ability to circumvent country-specific regulations addressing labor, fiscality and environmental issues and to the difficulty of tracing their electronic suppliers' practices. Harmful corporate actions may be observed in the following domains: • Animal welfare (physical and psychological, wildlife habitats) • Community (indigenous or local communities, conflicts associated to rare-earth metals) • Diversity (women or other underrepresented groups on board of directors or among senior managers) • Environment (hazardous waste, toxic emissions, harmful mining practices, damages to ecosystems) • Employees (unions, workers' health and safety, retirement benefits, freelance-based or abusive work conditions of the gig economy) • Governance (fiscality, tax evasion, managers' compensation, ownership, accountability,) • Human rights (labour rights, discrimination based on ethnicity, religion, gender or sexual orientation) • Products (safety, marketing, antitrust, violation of terms of reference, data mismanagement, planned obsolescence, addictive functionalities) Has the organization that makes the solution available been involved in the past decade or is currently involved in irresponsible corporate actions?
o No o Yes, thus exclude 5.2 D/AI solutions that lack minimal responsibility requirements There are situations where the tool cannot be applied meaningfully because basic minimal requirements for avoiding irresponsible practices in the D/AI industry are not met. D/AI solutions that must be excluded from the assessment may involve four irresponsible practices that are defined as follows: • Data reselling is the primary business model: The main business model of an organization that makes a D/AI solution freely available to users or at low cost can be to generate revenues through data reselling. When a solution primarily exists for generating revenues through a means that is not disclosed to users or intentionally made obscure, it should be excluded from the assessment; • The D/AI solution is deliberately deceptive: A D/AI solution, such as a text-or voice-based chatbot, can appear or be presented to users as if it were a real human interacting with them. When the non-human nature of the solution is not disclosed to users and when there are no reminders of the non-human nature of the solution, it should be excluded from the assessment; • There is a lack of cybersecurity and personal data protection: When an organization that makes available to users a D/AI solution has not established proper measures to guarantee cybersecurity and personal data protection, its solution should be excluded from the assessment; • The AI relies on biased datasets: The dataset used to train an AI solution may be biased, may produce results that cannot be generalized to the entire population of intended users, may lead to unfair decisions against particular individuals or groups or may entice discriminatory behaviours. When the appropriateness of the dataset used to train the algorithmic system has not been properly validated, the solution should be excluded from the assessment.
Does the organization behind the D/AI solution generate revenues primarily through data reselling, deliberately deceive their users, fail to meet established cybersecurity and personal data protection standards, or rely on biased datasets for its AI? This section contains four questions. You are invited to rate on a five-level scale the importance and clarity of one responsibility attribute and make suggestions for its scale and any other additional attributes.
The Assessment step relies on twelve responsibility attributes organized into five value domains. Two of the nine original RIH Assessment Tool attributes were substantially modified (Frugality and Eco-responsibility) and three new attributes were introduced (Human agency, Human-centred interoperability and Data governance).
While we only need your input on these five attributes, we show all attributes so you can also look at the fourlevel scales used in the original RIH Assessment Tool. Based on the input we gather in Round 1 of the e-Delphi survey, we will consolidate the new attributes and develop their scales. You will be kindly asked to appraise them in Round 2.

Population health value domain
The Population health value domain relies on four attributes that aim to capture whether the D/AI solution: 1) addresses an important burden of disease; 2) enables humans to exert their autonomy; 3) identifies means to mitigate the ethical, legal and social issues its use may raise; and 4) tackles health inequalities.
The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Health relevance
Refers to the respective importance of the health needs addressed by the innovation within the overall burden of disease, considering the causes of death, injury and disability and associated risk factors in the region where the intended users are located. Metrics of health relevance include number of deaths, disability-adjusted life years (DALYs), years lived with disabilities (YLDs), years of life lost (YLLs), prevalence and incidence rates. Recent data for such measures (at a global, national or regional level) can be found in the Global Burden of Disease Study of the Institute of Health Metrics and Evaluation.
The D/AI solution addresses a cause of death, injury or disability or a risk factor falling within: A. The top quarter of all causes of death, injury or disability or risk factors (75% and above) B. The upper middle quarter (50% to 74%) C. The lower middle quarter (26% to 49%) D. The bottom quarter (the lowest 25%)

Human agency
Refers to the capacity of individuals and groups to exert their autonomy when a given D/AI solution is in use. To enable, encourage and protect human agency, a D/AI solution seeks to empower its users and society more broadly by having full control over the way data about them is being collected and used. Human agency is more likely to be achieved when procedures enable individuals and groups: • To know when a D/AI solution is in use; • To fully understand a given algorithmic decision or advice; • To know how to contest such decision or advice; • To be able to engage in their own preferred course of action, without any undue pressure, personal or professional prejudices; • To be heard and have one's rights protected when discrepancies arise. Q 6.2.1 How important is this attribute? The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Means to mitigate Ethical, Legal and Social Issues (ELSIs)
ELSIs refer to a D/AI solution's positive and negative impacts on the moral and sociocultural well-being of individuals and groups and to the legal and regulatory issues its use raises. Although not all ELSIs can be identified at an early stage, a responsible D/AI solution identifies the means by which negative impacts can be mitigated, which may include: • For ethical issues: User-friendly terms of reference, patient decision-aids, psychological support, group empowerment, practice guidelines, etc. • For legal and regulatory issues: Laws and regulatory frameworks regarding discrimination (health insurance, the workplace), individual rights, data stewardship (cybersecurity reporting, right to contest algorithmic decisions), privacy, confidentiality, adverse event monitoring, etc. • For social issues: User support staff knowledgeable of the context where users are located, stigma-reduction programs, caregiver support, community-led educational forums, return to work strategies, etc. Means to mitigate the negative impacts of the innovation are available for: A. All three categories of ELSIs B. Two categories of ELSIs C. One category of ELSIs D. None of these categories The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Health inequalities
Refers to the avoidable health status differences across individuals and groups that are associated with one's socioeconomic status, social position and capabilities (skills, knowledge, perceived self-efficacy, social network, etc.). Groups who suffer a greater burden of mortality and morbidity due to who they are or where they grow up, live and work are considered vulnerable. Such groups include, but are not limited to: • Subsistence farmers, long-term unemployed, informally employed, seasonal/daily workers • People living in deprived urban or rural areas, living in poverty, experiencing homelessness, living with disabilities, living with mental illnesses • Visible minority groups, asylum seekers, refugees, socially marginalized groups (e.g., lesbian, gay, bisexual, transgender and queer [LGBTQ+], low literacy, etc.).

The D/AI solution:
A. Reduces inequalities by being responsive to the specific capabilities and needs of a vulnerable group B. May contribute to the reduction of inequalities since ability to benefit from the innovation is not affected by one's socioeconomic status, social position or capabilities C. May contribute to the increase of inequalities since the ability to benefit from the innovation is affected by one's socioeconomic status, social position or capabilities D. Increases inequalities by catering to the specific needs of groups whose socioeconomic status, social position or capabilities are amongst the highest 6.5 Additional attributes Q 6.5.1 Are there additional attributes you think are required to assess the population health value of D/AI solutions?
Yes No Q 6.5.2 If you answered YES, please briefly explain the additional attribute(s) you have in mind.
[free text; If available, please provide a scientific or grey literature reference] Section 7 -The assessment step: Health system value domain This section contains four questions. You are invited to rate on a five-level scale the importance and clarity of one responsibility attribute and make suggestions for its scale and any other additional attributes.

Health system value domain
The Health system value domain relies on four attributes aiming to capture whether: 1) the innovation design processes were inclusive; 2) the D/AI solution addresses an important system-level challenge in health and social care; 3) the solution is interoperable with existing digital health and social care infrastructures; and 4) the level and intensity of care required by the solution foster health system sustainability.
The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Inclusiveness
Refers to the degree of stakeholder engagement in the design, development, prototyping and testing of a D/AI solution. Different methods (e.g., codesign, interviews, hackathons, citizen juries, focus groups, workshops, pilot testing, user assessment and feedback) can be used to engage different types of stakeholders (e.g., health and social care practitioners, decision makers, patients, relatives, community and civil society representatives).
Involving a diverse and relevant set of stakeholders through an accountable method is likely to improve a D/AI solution. Hence, RIH makes explicit the rationale and scope of the stakeholder engagement process and its impact on the innovation design and delivery.

Those who developed the D/AI solution:
A. Engaged a diverse and relevant set of stakeholders through a formal method and explained how their input was integrated in the design process B. Engaged a diverse and relevant set of stakeholders through a formal method, but did not explain how their input was integrated in the design process C. Either engaged a limited set of stakeholders or did not explain the method used D. Did not engage stakeholders The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Responsiveness
Refers to the ability to provide dynamic solutions to existing and emerging challenges in health systems. To support health system sustainability, a responsible D/AI solution should address system-level challenges, which may include: • Demographic shifts (ageing, populations affected by climate change, war or conflicts) • Epidemiologic shifts (chronic diseases, new or re-emerging infectious diseases, orphan diseases) • Human resources hurdles (training, supervision, turnover) • Service delivery gaps (accessibility, quality, patient centeredness) • Knowledge gaps (data analysis and interpretation, development and implementation of knowledgebased tools) • Governance gaps (coordination, intersectoral action, community partnerships) The D/AI solution addresses: A. A system-level challenge that is documented as being of high importance in the target region B. A system-level challenge that is documented as being of moderate importance in the target region C. A system-level challenge that is documented as being of low importance in the target region D. No specific system-level challenges

Human-centered interoperability
Refers to the ability of the D/AI solution to easily communicate and work with the digital infrastructures already in use in the clinical and non-clinical environments where its users evolve (e.g., the patient's home, community organizations, hospitals, public transit, etc.). A responsible D/AI solution should seamlessly interface with its users' established data management practices, without creating additional cognitive and administrative burden. To achieve human-centered interoperability, a D/AI solution may possess the following characteristics: • Non-proprietary software solutions are used; • Data sharing functionalities are aligned with the capabilities and needs of different users; • Users are swiftly informed about the impact of maintenance activities, updates or EOL transition on interoperability; • Data sharing functionalities robustly 'follow the patient' across non-clinical and clinical environments. Each attribute in the final version of the tool will be accompanied by its corresponding four-level Likert-like scale, ranging from A to D, where A implies a high degree of responsibility and D implies no particular signs of responsibility.

Q 7.3.4 For the Human-centered interoperability attribute, what procedures, characteristics or properties should the A on the scale emphasize? [free text]
The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Level and intensity of care
Refers to the principle of subsidiarity according to which the most decentralized unit in the health and social care system, including the patient, should be mobilized to provide the service when it is possible to do so effectively and safely. Subsidiarity may be achieved, for instance, by supporting patients' capacity for self-care, enabling proper follow-up by general practitioners, community health and social care providers, or reducing unnecessary interventions at the most specialized level of care of the health system.
While many D/AI solutions may target the patient as the primary user, proper follow-up with formal care providers may still be required (e.g., chronic diseases, mental health, rehabilitation, etc.). To support health system sustainability, a responsible D/AI solution should seek to generate high-quality outcomes while optimizing labour intensity.
The solution was designed to be used safely and effectively mostly under the care of: A. The patient, an informal caregiver or a health and social care provider operating in a nonclinical environment B. The patient, an informal caregiver or a health and social care provider operating in a primary health care facility C. Health and social care providers operating in a secondary or intermediate level of care facility D. Health and social care providers operating at the most specialized level of care within the health system 7.5 Additional attributes Q 7.5.1 Are there additional attributes you think are required to assess the health system value of D/AI solutions?
Yes No This section contains four questions. You are invited to rate on a five-level scale the importance and clarity of one responsibility attribute and make suggestions for its scale and any other additional attributes.

Economic value domain Frugality
The Economic value domain relies on the concept of frugality which highlights the ability to deliver greater value to more people by using fewer resources such as capital, materials, energy and labour time. Designers of frugal innovation aim to substantially reduce the costs of production, use and maintenance of an innovation, focus on the core functionalities its users require and optimize its performance level considering the intended purpose and context of use.
This attribute will always apply to software and it will apply to hardware when: 1) its raison d'être lies with a D/AI solution; and 2) is needed to deliver its service.
• For instance, a portable finger sensor enabling patients to make the ECG recordings that an AI solution uses to detect cardiac problems meets these two criteria: its raison d'être lies with the AI solution because it fulfills no other purpose and it is a minimal requirement because the AI solution cannot detect cardiac problems without it. The sensor influences the responsibility of the AI solution because it is a necessary component that would otherwise not exist. • In contrast, the smartphone or tablet the patient uses to access the same ECG-based AI solution falls outside the scope of the assessment because one condition is not met: the raison d'être of a smartphone or tablet is not to support this particular AI solution.
The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Hardware frugality (when applicable)
The economic value of a D/AI solution may be increased when its hardware incorporates three frugal innovation characteristics: • Affordability, which may result from optimized hardware production processes and/or lower maintenance needs; • Focus on core functionalities and ease of use in order to meet the requirements of a larger number of users (e.g., in rural, isolated, remote or resource-constrained settings, etc.); • Optimized performance, which maximizes the fit between the hardware's characteristics and its context of use (e.g., robustness if used in difficult climatic conditions, high autonomy if used in remote settings, economies of scale if used in large centers, etc.).

Software frugality
The economic value of a D/AI solution may be increased when its software incorporates three frugal innovation characteristics: • Affordability, which may result from optimized software development strategy and lower maintenance needs; • Focus on core functionalities and ease of use in order to meet the digital capabilities of a larger number of users (e.g., speaking different languages, with physical and/or cognitive limitations, lacking onsite technical support, etc.); • Optimized performance, which maximizes the fit between the software and the digital capacities in the context of use of the solution (e.g., adapted to settings where connectivity is compromised or data plans are unaffordable, etc.

Section 9 -The assessment step: Organizational value domain
This section contains four questions. You are invited to rate on a five-level scale the importance and clarity of one responsibility attribute and make suggestions for its scale and any other additional attributes.

Organizational value domain
The Organizational value domain relies on two attributes aiming to capture the extent to which the organization that produces the D/AI solution has: 1) developed a business model that can provide more value to users, purchasers, and society; and 2) full control over the entire lifecycle of the data its D/AI solution gathers, exploits, generates, archives and/or shares with users and with third parties (voluntarily or not).
The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Business model
Refers to the components through which an organization creates, delivers and captures social and economic value. A business model typically entails a tension between the redistribution of financial returns to shareholders and the provision of a high-quality D/AI solution. The business model of organizations that seek to provide more value to users, purchasers and society may possess the following characteristics: • Pursue a social and/or environmental mission, operate on a not-for-profit basis or reinvest the majority of the revenues in their mission (e.g., social enterprises); • Make the solution and its hardware components (see Scope of assessment) freely usable or exploitable by others (i.e., open source, product licensing waivers, do-it-yourself); • Adopt a pricing scheme based on ability to pay or a redistributive logic (e.g., fees modulated according to user segments); • Employ people with particular needs (e.g., low literacy, disabilities); • Comply with social responsibility programs (e.g., Certified B Corporation, SA8000 standard for decent work, ISO26000 for social responsibility).
The business model of the organization that makes the solution available to end users the D/AI solution possesses: A. Three of the characteristics described or more B. Two of the characteristics described C. One of the characteristics described D. None of the characteristics described

Data governance
Refers to the stewardship, structures and processes the organization sets in place to ensure full control over the entire lifecycle of the data it gathers, exploits, generates, stores and/or shares with users and third-parties (voluntarily or not). From data collection to data destruction, the organization and its leaders must remain transparent about, accountable for, and swiftly responsive to any breaches in data protection, and to any other issues affecting the D/AI solution data's lifecycle.
Organizations producing responsible D/AI solutions make their high-level executives and employees knowledgeable about and able to report to external auditors on the sensitivity and scope of use of all datasets linked to their solutions. Procedures to achieve responsible data governance include: • Fully active oversight committees whose members' conflicts of interest are publicly declared; • Explicit compliance to the laws and regulatory frameworks where users are located; • Adherence to industry standards specific to D/AI solutions (e.g., ISO 13482:2014 for safety of personal care robots, ISO/TS 82304-2 for quality and reliability of health and wellness apps); • Training programs and certification system for in-house data stewards; • Fully functional and active reporting systems. Q 9.2.1 How important is this attribute? This section contains four questions. You are invited to rate on a five-level scale the importance and clarity of one responsibility attribute and make suggestions for its scale and any other additional attributes.

Eco-responsibility
The Environmental value domain relies on the concept of eco-responsibility which refers to a product, process or method that reduces the negative environmental impacts of a D/AI solution along its lifecycle.
Like the Frugality attribute, eco-responsibility is assessed separately for hardware (applicable when the raison d'être of the physical components lies with the solution and are needed to deliver its service), and for programing and software.
The following attribute is part of the original RIH Assessment Tool and does not need to be assessed. Looking at its four-level scale may help you think about the scales that need to be developed for the new attributes.

Hardware eco-responsibility (when applicable)
The responsibility of a D/AI solution can be increased by attending to eco-responsibility concerns at key stages in the lifecycle of its hardware requirements, which include: • Raw material sourcing (e.g., product or hardware made of recycled or renewable content materials, free of substances such as latex, metals or chemicals that are of major public health concern or harmful and toxic to ecosystems)* • Manufacturing (e.g., efficient energy consumption, compliance with national or international environmental regulations, reduced solid or water waste) • Distribution (e.g., packaging, transportation) • Use (e.g., efficient energy consumption, reusability, durability) • Disposal (e.g., product or hardware designed to be recycled, disassembled, remanufactured, composted or biologically degraded) * Arsenic, asbestos, benzene, bisphenol A, bromine & chlorine-based compounds, cadmium, chromium, dioxin & dioxin-like substances, lead, mercury, phthalate, PVC.
The solution was designed by integrating hardware eco-responsibility concerns at: A. Three key lifecycle stages or more B. Two key lifecycle stages C. One key lifecycle stage D. None of the key lifecycle stages

Programming and software eco-responsibility
Responsibility of a D/AI solution can be increased by using clean energy sources and reducing as much as possible the quantity of energy consumed when training, validating and feeding an algorithmic system or when developing software. Such ecoresponsible practices include: • Choosing programming, modeling or computational techniques that substantially reduce the quantity of energy and time required; • Using Central Processing Units (CPUs) and computers that are highly energy-efficient (standards); • Eliminating the use of non-renewable energy sources such as oil, gas and coal; • Storing and archiving data in data centers and server farms that are net-zero or climate positive. Q 10.2.1 How important is this attribute? Each attribute in the final version of the tool will be accompanied by its corresponding four-level Likert-like scale, ranging from A to D, where A implies a high degree of responsibility and D implies no particular signs of responsibility.

Video presentation
In this 10 min. video presentation, we: • Clarify the kind of tool we are developing; • Summarise the changes brought to the tool after Round 1; • Introduce you to the structure of the Round 2 survey.

Aim of the tool
The aim of the tool under development is to assess the degree of responsibility of digital solutions in health and social care that operate with or without AI (hereafter called "D/AI solutions).
The High-level expert group on AI set up by the European Commission (2019) defines AI systems as software and possibly also hardware systems "that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal." In the tool under development, the term D/AI solutions is used for solutions that operate with or without AI, but it recognizes the specificities of AI, which is concisely defined as: • An algorithmic system that uses data to infer patterns, draw conclusions and/or make decisions and this process may entail supervised (e.g., machine learning) as well as unsupervised (e.g., deep learning) programming.
For example, an AI solution can be trained to process electrocardiogram (ECG) recordings to detect the presence of cardiac problems and provide a diagnostic interpretation.

What kind of tool are we developing? How do we define responsibility?
The tool is informed by our work on Responsible Innovation in Health (RIH) as well as a corpus of 56 practiceoriented frameworks and tools that aim to support responsibility in D/AI solutions. It is structured around the RIH conceptual framework and follows the logic of the RIH Assessment Tool. The tool under development thus aims to provide a quantitative measure of the degree of responsibility of D/AI solutions in health and social care.
RIH draws on the policy-oriented field of Responsible Research and Innovation (RRI), which aims to steer innovation towards the 'right' societal impacts. RIH approaches responsibility as a matter of degree that can be identified by examining the extent to which an integrated set of process-, product-and organizational-level responsibility attributes are met.
These attributes are not static and thus the RIH Tool establishes whether, at a given point in time, they are present.
The rationale of the tool under development is not to measure 'irresponsibility' but rather to account for the extent to which a given D/AI solution brings us closer to achieving the 'right' health and social care impacts. The latter are defined through a health equity lens as well as a health system economic and environmental sustainability lens.

Trade-offs and synergies between responsibility attributes
The original RIH Assessment Tool helps developers identify potential trade-offs that can be made between responsibility attributes. For example, in certain situations, it may be legitimate to have a lower score on the Health relevance attribute to reach a higher score on the Eco-responsibility attribute (or vice versa). Conversely, upstream design decisions can be made to specifically increase synergies between attributes, thereby augmenting the solution's overall degree of responsibility. For instance, aiming for a higher score on the Inclusiveness attribute can help stakeholders develop a more frugal solution.

Who will apply the tool? At what stage? And to inform what kind of decisions?
The tool was developed to inform the decisions of those who develop D/AI solutions and influence the 'supply side' such as data scientists, programmers, entrepreneurs, investors, research funders, incubators, etc. as well as the decisions of those who influence the 'demand side', including purchasers, implementers, and users of D/AI solutions such as patients, clinicians, health, and social care managers, etc.
Like the original RIH Assessment Tool, this tool will be translated in French and Portuguese, made freely accessible and usable with proper academic citation, and it will be possible to use it in two ways: 1. As a formal evidence-informed assessment tool: It will be applied by people who possess research skills. Judgment over each attribute must be made by an interdisciplinary team after having searched, retrieved, and compiled the relevant sources of information. A formal assessment will be more accurate at the General Availability (GA) stage of a D/AI solution because it is a lifecycle juncture where more robust security, usability and compliance tests have been completed and peer-reviewed studies more likely to be published. A formal assessment is thus best performed at the GA stage or after. Repeated application of the tool over time will help track variations in the degree of responsibility of a D/AI solution. 2. As a structured design or procurement brief: The tool describes process-, product-and organizationallevel responsibility features that can guide the design, development, purchasing, deployment, and use of D/AI solutions. Because the tool's attributes are defined in a tangible way and its four-level scales described in mutually exclusive terms, the tool can be used as a structured roadmap to guide the decisions made either before or after the GA stage.

What falls within the scope of the assessment of a D/AI solution's degree of responsibility?
Because D/AI solutions typically rely on a wide-ranging network of digital devices and infrastructures, we set two conditions to determine what components should be included in the assessment: 1) their 'raison d'être' is to support the D/AI solution; AND 2) they are part of the minimal requirements for the D/AI solution to deliver its service.
• For instance, a portable finger sensor enabling patients to make the ECG recordings that an AI solution uses to detect cardiac problems meets these two criteria: o its raison d'être lies with the AI solution because it fulfills no other purposes; and o it is a minimal requirement because the AI solution cannot detect cardiac problems without it. • In contrast, the smartphone or tablet the patient uses to access the same ECG-based AI solution falls outside the scope of the assessment because one condition is not met: o the raison d'être of a smartphone or tablet is not to support this particular AI solution.

Overview of the tool's components to be appraised in Round 2
Drawing on all Round 1 comments (n=202), we substantially revised the components of the tool where a robust construct quality threshold was not reached and improved many others. For 85% of the experts or more, additional premises, criteria or attributes were not required. We thus did not introduce new components to the tool. A summary of the results and corresponding changes can be found in Table 1 (hyperlink). Our responses to the comments are listed in this downloadable document, where you will see insightful criticisms and suggestions. Figure 1 highlights which components of the tool you are kindly asked to appraise in Round 2 of this e-Delphi survey. The premises at the top of the figure clarify the tool's overall approach to responsibility in D/AI solutions.
o You will find the revised premises at the end of the survey and your input is optional again in Round 2.
The application of the tool entails a three-step process: • The screening step aims to determine whether a D/AI solution is eligible to a formal assessment through five inclusion and exclusion criteria. o We revised three criteria and introduced the sources of information that can be used to apply them. o We need your input on two exclusion criteria in Round 2.
• The assessment step ascertains the presence of responsibility features through fourteen attributes organized into five value domains. All attributes are assessed through a four-level Likert-like scale, ranging from A to D, where A implies a high degree of responsibility and D implies no particular signs of responsibility.
o We revised five attributes and introduced their scales and the sources of information that can be used in the assessment. o You are now asked to review five attributes and appraise their scales in Round 2.
• The rating step determines the result of the assessment with the help of a scoring system that considers the availability and the quality of the sources of information used to score each attribute. o The scoring system has been already validated and is thus presented for your information only.

Quality of the sources of information used to rate the criteria and attributes
The types of information source that can be used to assess each criterion and attribute are indicated in the survey. A simple classification for summarizing their quality is used in the scoring system. Because independent organizations and peer-reviewed publications are more likely to be objective in their reporting, they are classified as being of better quality for the tool's assessment purposes. • Availability + quality of information sources  (2 pts): Reports by multilateral organizations (e.g., WHO, OECD), governments, regulatory agencies, certification bodies or independent not-for-profit organizations that monitor and report on human and labour rights, animal welfare and environmental regulation. • Type 3. High quality (3 pts): Peer-reviewed scientific articles and systematic reviews of the scientific literature (including Health Technology Assessments, Cochrane Reviews, etc.). Ready to begin Round 2 of the e-Delphi survey? Please click "Participate" to access the consent form. You may come back to this section at any time.

Section 2 -The Screening step inclusion criteria
The Screening step relies on two inclusion criteria that are meant to swiftly identify solutions that: 1) meet the D/AI solution definition; and 2) effectively and safely address at least one determinant of health.
The construct quality threshold was reached for the first criterion and the second is part of the original RIH Tool. To see the complete set of criteria and attributes that are part of the RIH Tool, you may download this document.
You may proceed to the next section of the survey.

Section 3 -The Screening step exclusion criteria
The Screening step relies on three exclusion criteria meant to identify solutions that: 1) have not reached the General Availability (GA) stage; 2) are produced by an organization involved in irresponsible corporate actions; or 3) do not disclose key D/AI risks to users.
To deliver a valid and meaningful responsibility score when the tool is used in a formal evaluation process, the assessment should be made when the D/AI solution has been sufficiently tested, i.e., when the GA stage has been reached. Before this stage, the tool may still be used as a design or procurement brief to inform decisions, but we suggest postponing a formal evaluation process. This is the key purpose of the exclusion criteria.
You are invited to rate on a five-level scale the applicability of two exclusion criteria that were substantially revised considering all participants' comments.

General Availability stage not reached Criterion definition Question to be answered
While at the Release to Manufacturing (RTM) stage a D/AI solution is of sufficient quality for mass distribution, General Availability (GA) refers to a point where necessary commercialization activities including security, usability and compliance tests have been completed. When a D/AI solution has not reached the GA stage, this tool may be used to inform design or procurement decisions, but we recommend postponing a formal assessment of its degree of responsibility.

Nondisclosure of key D/AI risks Criterion definition Question to be answered
Regulation of the D/AI industry is currently scant and unevenly enforced within and across countries. Until proper regulation is implemented and enforced, organizations aiming to produce responsible D/AI solutions in health and social care should refrain from reselling data and publicly disclose their in-house mechanisms to mitigate key risks to users.
There are at least three areas of concern where clear disclosure statements need to be found before applying this tool in a formal evaluation process: • Data reselling: e.g., an organization can make a D/AI solution freely available to users or at low cost while generating its core revenues by selling user-related data. To avoid any ambiguities regarding its core mission, an organization producing a D/AI solution should refrain from selling data and make its position explicit; • Cybersecurity and personal data protection: e.g., cybersecurity and personal data protection require proper high-level governance oversight as well as operational procedures. An organization producing a D/AI solution should describe its data protection measures and disclose how risks are monitored and mitigated; • AI training datasets: e.g., the dataset used to train an AI solution may be biased, may produce results that cannot be generalized to the entire population of intended users, may lead to unfair decisions against particular individuals or groups or may entice discriminatory behaviours. An organization producing an AI-based solution should justify the appropriateness of the dataset used to train its algorithm and disclose how potential biases are identified and mitigated.
Are there public disclosure statements regarding data reselling, cybersecurity and personal data protection, and AI training datasets?
o Exclude from a formal evaluation if the answer is 'no' for any of the three applicable areas of concern o Yes Sources of information required at this stage

Section 4 -The assessment step: Population health value domain
The Assessment step relies on fourteen responsibility attributes organized into five value domains. This first section is for the Population health value domain and contains two questions. You are invited to assess the clarity of the Human agency attribute definition and the appropriateness of its scale.

Population health value domain
The Population health value domain relies on four attributes that aim to capture whether the D/AI solution: 1) addresses an important burden of disease; 2) supports human agency; 3) identifies means to mitigate the ethical, legal, and social issues its use may raise; and 4) tackles health inequalities.

Attribute definition Scale
Refers to the capacity of individuals and groups to actively and independently exert their decision-making autonomy and act in accordance with their goals when using a D/AI solution.
Though D/AI solutions may improve population health by facilitating a range of human decisions and actions, little is known about the way D/AI solutions affect in practice user behaviour, cognition and judgement (e.g., overreliance, avoidance, overconfidence,

The D/AI solution is accompanied by:
A. Procedures that enable users to understand its outputs, decide and act in accordance with their own goals, and include formal means to have their concerns acted upon B. Procedures that enable users to understand its outputs and decide and act in accordance with their own goals, but do not include hypervigilance) and thus their impact on care seeking behaviours and on health and social care provision. Responsible D/AI solutions can support human agency when there are human oversight procedures to enable individuals and groups: • To understand a D/AI solution's outputs, that is, the measures, recommendations or decisions it produces (e.g., data visualization and interpretation, decision tree, plain language recommendation, transparency if an AI-based solution is unexplainable); • To discuss these outputs with properly trained staff when needed (e.g., dedicated point of service) and decide their own preferred course of action without undue pressure from the D/AI solution itself and from peers (e.g., freedom to use one's judgement or override an AI-based decision, education and training, guidelines); • To have their concerns heard and acted upon through formal human oversight mechanisms (e.g., committees for audit, review, appeal, and redress, excluding chatbots, call centers, digital contact forms, etc.).* * The High-level expert group on AI convened by the European Commission (2019) describes three levels of human oversight for AI: 1) human-in-the-loop: human intervention is found in every decision cycle of an AI-based solution (which may often prove neither possible nor desirable); 2) human-on-the-loop: humans intervene in the design of the solution and the monitoring of its operations; and 3) human-incommand: humans oversee the overall use of the solution, can decide whether to use it or not, determine the level of human discretion regarding when and how to use it, and override a decision made by it.
formal means to have their concerns acted upon C. Procedures that enable users either to understand its outputs or decide and act in accordance with their own goals, but do not include formal means to have their concerns acted upon D. No particular human oversight procedures Information sources • Type 1 info describing the human oversight procedures accompanying the D/AI solution. • Type 2 or Type 3 info examining the effectiveness of the human oversight procedures accompanying the D/AI solution. Section 5 -The assessment step: Health system value domain This section contains three questions. You are invited to assess the importance and clarity of the Systemic interoperability attribute and the appropriateness of its scale.

Health system value domain
The Health system value domain relies on four attributes aiming to capture whether: 1) the innovation design processes were inclusive; 2) the D/AI solution addresses an important system-level challenge; 3) supports carecentric interoperability across clinical and non-clinical settings; and 4) the level and intensity of care it requires foster health system sustainability.

Attribute definition Scale
Refers to how smoothly a D/AI solution can operate within and across the clinical and non-clinical settings where users provide care, manage care, receive care, or take care of themselves (e.g., hospitals, clinics, the patient's home, community organizations) without adding significant cognitive and/or administrative burden to users.

The D/AI solution:
A. Is periodically adjusted to fit users' digital infrastructures, aligns with their data management practices, and provides all relevant data sharing functionalities Because the growing use of D/AI solutions may increase fragmentation of care, duplication in data collection processes, vendor lock-ins, or data sharing hurdles across settings (e.g., hospital care units, health and social care system organizations), responsible D/AI solutions are designed to increase their adoptability within the data management practices of their users and adjusted over time to seamlessly communicate and work with their digital infrastructures. Care-centric interoperability can be achieved by: • Designing a solution that is operable on widely available systems and devices and aligns with user capabilities, needs, work processes and task allocation to minimize cognitive and administrative burden; • Providing data sharing functionalities that are well-thought through administrative processes and clinical pathways (e.g., 'following the patient' across care settings when relevant) and using nonproprietary software or solutions that facilitate data exportation; • Testing the D/AI solution in the context of use before its full deployment and regularly assessing how it interfaces with users' evolving digital infrastructures (e.g., robustness, reliability, traceability).
B. Is periodically adjusted to fit users' digital infrastructures and aligns with their data management practices, but provides limited data sharing functionalities C. Either requires substantial adaptations to users' data management practices or provides limited data sharing functionalities D. Requires substantial adaptations to users' data management practices and provides limited data sharing functionalities You are invited to assess the importance and clarity of the Software frugality attribute and the appropriateness of its scale.

Economic value domain Frugality
The Economic value domain relies on the concept of frugality which highlights the ability to deliver greater value to more people by using fewer resources such as capital, materials, energy, and labour time. Designers of frugal innovation aim to substantially reduce the costs of production, use and maintenance of an innovation, focus on the core functionalities its users require and optimize its performance level considering the intended purpose and context of use.
Frugality is easily overlooked in the health and social care domain, but it clearly matters to its future. First, most healthcare systems in industrialized countries -be they publicly or privately funded-are struggling with the introduction of increasingly costly and labour-intensive products and services (e.g., gene therapies may cost 2 million US$ per patient per treatment). Second, as shortcomings in globalized supply chains are becoming more acute, there is an undeniable need for using much more wisely the natural and economic resources that go into the production of goods and services. Third, taking heed of frugal design principles will enable D/AI developers to benefit a greater number of patients within and across countries.
The Frugality attribute will always apply to software, and it will apply to hardware when: 1) its raison d'être lies with a D/AI solution; and 2) is needed to deliver its service.
• For instance, a portable finger sensor enabling patients to make the ECG recordings that an AI solution uses to detect cardiac problems meets these two criteria: its raison d'être lies with the AI solution because it fulfills no other purpose, and it is a minimal requirement because the AI solution cannot detect cardiac problems without it. The sensor influences the responsibility of the AI solution because it is a necessary component that would otherwise not exist. • In contrast, the smartphone or tablet the patient uses to access the same ECG-based AI solution falls outside the scope of the assessment because one condition is not met: the raison d'être of a smartphone or tablet is not to support this particular AI solution.

Software frugality Attribute definition Scale
The economic value of a D/AI solution may be increased when its software incorporates three frugal innovation characteristics: • Affordability, which may result from optimized software development strategies, open-source programming tools, and/or low technical support, update, and maintenance needs; • Focus on core functionalities and ease of use in order to meet the needs and capabilities of a larger number of users (e.g., universal interface design for users with low literacy, physical and/or cognitive limitations, cognitive ergonomics); • Optimized performance, which maximizes the fit between software functionalities and requirements and location-dependent digital capacities (e.g., edge-computing for settings where connectivity is compromised or data plans unaffordable).

The D/AI solution incorporates…
A. All three characteristics of software frugality B. Two characteristics of software frugality C. One characteristic of software frugality D. No characteristics of software frugality

Information sources
• Type 1, Type 2 or Type 3 info describing the D/AI solution's core functionalities, usability and costs, and the resources required for its production, use and maintenance.

Section 7 -The assessment step: Organizational value domain
This section contains one question. You are invited to assess the appropriateness of the scale of the Data governance attribute.

Organizational value domain
The Organizational value domain relies on two attributes aiming to capture the extent to which the organization that produces the D/AI solution has: 1) developed a business model that can provide more value to users, purchasers, and society; and 2) proper control over the entire lifecycle of the data its D/AI solution gathers, exploits, generates, archives and/or shares with users and third parties (voluntarily or not).

Attribute definition Scale
Refers to the stewardship, structures and processes the organization sets in place to ensure full control over the entire lifecycle of the data it gathers, exploits, generates, stores, and/or shares with users and third parties (voluntarily or not). From data collection to data destruction, the organization and its leaders must remain transparent about, publicly accountable for, and swiftly responsive to any breaches or issues affecting the D/AI solution data's lifecycle.
Responsible data governance makes high-level executives and employees knowledgeable about and able to assess and report on the sensitivity and scope of use of all datasets linked to the D/AI solution and this can be achieved through an integrated set of procedures: • Defining performance indicators for organizational data protection practices (e.g., certifiable standards such as ISO/IEC 27001 for information security management) and for the D/AI solution (e.g., ISO 13482 for safety of personal care robots, ISO/TS 82304-2 for quality and reliability of health and wellness apps, ISO 42001 for AI); • Securing an ongoing training and certification program for managers and employees to be properly skilled in data management (e.g., data stewards); • Assigning a high-level team, accountable to the board of directors, that monitors the way employees gather, exploit, generate, store, and/or share data and informs users and/or the public of any breaches and incidents; • Integrating the above-described procedures into a reporting system that is auditable by a third party.

Control over the D/AI solution's data lifecycle relies on:
A. Data governance procedures that include performance indicators and training programs auditable by a third party B. Data governance procedures that include performance indicators and training programs under the responsibility of a high-level team accountable to the board of directors C. Data governance procedures that include either performance indicators or training programs D. None of these procedures Information sources • Type 1 info describing the organization's data governance procedures. • Type 2 or Type 3 info examining the quality and outcomes of the organization's data governance procedures. This section contains two questions. You are invited to assess the importance of the Programming and software eco-responsibility attribute and the appropriateness of its scale.

Eco-responsibility
The Environmental value domain relies on the concept of eco-responsibility which refers to a product, process or method that reduces the negative environmental impacts of a D/AI solution along its lifecycle.
Like the Frugality attribute, eco-responsibility is assessed separately for hardware (applicable when the raison d'être of the physical components lies with the solution and are needed to deliver its service), and for programing and software.
The carbon footprint of current computational infrastructures is close to that of the global airline industry and predicted to double by 2025. Simple hardware modifications can cut in half the energy consumed by software procedures and "coordinated changes in software and hardware could increase the energy efficiency of computing by a million times" (MIT Energy Initiative).

Attribute definition Scale
Responsibility of a D/AI solution can be increased by using clean energy sources and reducing as much as possible the quantity of energy The D/AI solution relies on: consumed when training, validating, and feeding an algorithmic system, when archiving data or when developing software. Software design decisions may also affect the quantity of energy used to operate the D/AI solution. Eco-responsible programming and software practices may include: • Choosing programming, modeling or computational techniques that substantially reduce the quantity of energy and time required to develop a D/AI solution (e.g., tinyML); • Using Central Processing Units (CPUs) and computers that are highly energy-efficient (e.g., chips and circuits reducing heat transfer); • Storing and archiving data in data centers and server farms where greenhouse gas emissions (GHGs) are reduced to a minimum (net zero) or where more GHSs are removed from the atmosphere than emitted (climate positive) (e.g., ISO/IEC 13273-1:2015 for Energy efficiency and renewable energy sources).
A. Three practices of programming and software eco-responsibility or more B. Two practices of programming and software eco-responsibility C. One practice of programming and software eco-responsibility D. None of the programming and software ecoresponsibility practices

Information sources
• Type 1 info describing whether and how the environmental impacts of programming, software development and data processing, storing, and archiving are addressed • Type 2 or Type 3 info analyzing the environmental impacts of programming, software development and data processing, storing, and archiving

Section 9 -Scoring system
This section does not contain any survey questions. It explains the scoring system that is part of the original RIH Tool.
The tool should be applied in a transparent and accountable way. To this end, a scorecard to calculate and report the overall responsibility score will be made available as an Excel spreadsheet. Detailed extracts from the sources of information justifying the score given to each attribute should be reported in this scorecard along with a list of references. Because the responsibility attributes of a D/AI solution are not static, the overall responsibility score reflects, at a given point in time, the extent to which an integrated set of process-, productand organizational-level responsibility attributes are met.
The scoring system relies on two interrelated components.

Component 1. Availability and quality of the sources of information
The assessment relies on a sufficient number of attributes when at least 11 of the 14 attributes are documented.
Number of attributes documented < 11/14 → The assessment is compromised by missing information Number of attributes documented  11/14 → The assessment covers key aspects of responsible D/AI solutions The scorecard indicates the sources of information used to score each attribute and the points associated to these sources. If more than one type of source is used for an attribute, the source of highest quality is retained and rated as follows: o Type 1. Low quality (1 point): Technical documentation made available by the organization that produces the D/AI solution. o Type 2. Moderate quality (2 pts): Reports by multilateral organizations, governments, regulatory agencies, certification bodies or independent not-for-profit organizations that monitor and report on human and labour rights, animal welfare and environmental regulation. o Type 3. High quality (3 pts): Peer-reviewed scientific articles and systematic reviews of the scientific literature. The overall quality of the sources of information is determined by calculating the mean value of the points obtained and is interpreted as follows: Mean score < 2: Low to moderate quality → The assessment is compromised by information sources of inferior quality Means core  2: Moderate to high quality → The assessment is based on information sources of superior quality

Component 2. Responsibility features of the D/AI solution
The attributes rely on a four-level Likert-like scale, where: A = a high degree of responsibility (5 pts) B = a moderate degree of responsibility (4 pts) C = a low degree of responsibility (2 pts) D = no particular signs of responsibility (1 pt) The overall responsibility features score is determined by calculating the mean value of the points obtained, which will fall within one of the following four intervals: Lastly, to interpret the overall score, one must consider whether the assessment relies on: i) a sufficient number of documented attributes (≥9/12); AND ii) information sources of superior quality (2).
→ When one of these two requirements is not met, the score is not considered meaningful.

Section 10 -Premises of the tool (optional)
This section of the survey contains six questions and is optional.
Drawing on the Responsible Innovation in Health (RIH) framework, four premises clarify how this assessment tool approaches responsibility. They have been revised following the comments shared in Round 1 and you are invited to assess their importance and/or clarity.
These premises are aligned with the aim of RIH, which is to steer the design and use of D/AI solutions towards the 'right' health and social care impacts. The latter include fostering health equity as well as the economic and environmental sustainability of health systems.

The context of use largely shapes responsibility (quality threshold reached for importance)
The overall responsibility of a given D/AI solution largely depends upon how and where it is used. The tool should thus be applied in view of the social, cultural, legal, economic, and political characteristics of the context where the users concerned by the assessment are located. Potential shifts in intended use, blind spots in regulatory frameworks as well as possible shortcomings in public policies in the context of use may affect the overall responsibility of a D/AI solution and thus call for careful attention by those who apply the tool.

Responsible D/AI solutions aim for collective benefits
With the widespread use of smartphones, personal wearable devices and hospital-based devices generating digital data (in radiology, pathology, or cardiology to name just a few), there is a strong tendency among D/AI solution developers to view health needs through an individual perspective (i.e., screening, diagnosing, predicting and/or treating an individual's health problem). As a result, they overlook key opportunities to address through other types of D/AI solutions either the causes of ill-health in large groups of people (e.g., air pollution and cardiovascular diseases) or ways to reduce or eliminate health risks for the collectivity as a whole (e.g., legislation on soft drinks or ultra-processed foods).
An individual perspective also downplays persistent disparities in the distribution of health risks and health benefits across social groups. Those who suffer from ill-health are exposed to health risks that cumulate over their life course, lead to more complex comorbidities and exacerbate the mental and physical "wear and tear of daily life." From a population health perspective, D/AI solutions should be shaped by a thorough understanding of Why are some people healthy and others not?. Those who apply the tool should thus recognize that although a D/AI solution that provides individual health benefits is valuable, a responsible D/AI solution should aim for broader collective benefits.

D/AI solutions should tangibly improve current processes and means
Digitalization is a relatively recent technological trend where promises abound and where opportunities around data exploitation have proliferated. Yet not all D/AI solutions are relevant in and of themselves and some may increase the overall cognitive, administrative and/or digital burden for both care providers and care recipients. Responsible digitalization should tangibly improve the digital or non-digital processes and means currently in use in health and social care. Those who apply the tool should thus examine whether the relevance of the D/AI solution is compelling and supported by research.

D/AI solutions modulate determinants of health
There is growing evidence that being able to access, use and benefit from digital tools and systems increases health inequalities, both at the individual-and group-level, because it modulates well-known determinants of health. For instance, access to education, housing, or employment increasingly unfolds through online transactions, thereby requiring digital literacy, proper Internet connectivity, an affordable data plan, and lowcost devices that can run recent software releases. Those who apply the tool should not take for granted individuals' wish for using a D/AI solution in health and social care and should look at the broader digital capabilities (e.g., skills and competence) and capacities (e.g., means and resources) needed to materialize its likely benefits. Yes, all or most of it No, or very little of it

Next steps
Once the data of Round 2 will be analyzed, we are likely to bring additional minor changes to the tool and then we will assess its inter-rater reliability using a diversified sample of D/AI solutions. The final version of the tool, its scorecard and user guide will be sent to you once they are finalized.
Meanwhile, you will receive a report with the results of Round 2. This premise tried to overcome a flaw the framing of this work, i.e., that a system can be assessed as being responsible or not. This starting point is an oversimplification as being responsible is an ongoing socio-technical act that can _only_ be assessed in a given socio-technical context at a particular point in time. 'Responsibility' is fundamentally _not_ a characteristic that can be measured and that value ascribed to a system or artefact, i.e., the pertinent question is more multifaceted, "who is being responsible to whom about what in which context". Overall, the framework seems to acknowledge and accommodate this view well, but the phrasing of this premise in implying the existence of a quality call 'responsibility' may be misleading. [I=5; C=1]

Comments
Rationale of the tool Wording: Definition of responsibility 3. Thank you. We clarified the rationale of the tool and further explained the purpose of the premises.
Digital divide prohibits many people from accessing, using and benefitting from digital devices. [I=4; C=4] Access to D/AI solutions 4. This is addressed in the premise about determinants of health.
The question of use is central. Probably clarify further what is meant by context: that of individuals, groups/communities, clinics/hospitals, developing country/"developed" country. Another element of context should also cover the "context of the development of the technology" itself. We need to have an idea of the circumstances and context in which the technology was developed, validated and tested. At present, there is a risk of replicating the excesses of the pharmaceutical industry which conducts its trials in countries where regulatory requirements are lighter and less considerate of patients' and individuals' rights. Moreover, it is not just for users to know the context of use. The people who evaluate, purchase and validate them also need to know the context of use. This is not always the case. It also needs to be the same in terms of the development, validation and test contexts. [I=5; C=3] Clarity: Definition of context of use Context of development 5. We revised the definition.
6. We agree. The context of development is underscored in exclusion criteria (Corporate Social Irresponsibility, Minimal requirements) and attributes (Data governance, Business model).
It seems like "context of use" here is limited to the environment. It may be good to clarify the case where a D/AI solution is being used for another purpose than the original intention in the design. For example, same environment but different goal when using the D/AI solution. Then one could call this "responsible usage". [I=4; C=4] Shifts in intended use 7. We revised the premise to highlight potential shifts in intended use as well as off-label uses.
Pour moi, ce n'est pas clair à quel contexte le "this context" réfère dans la phrase " While a D/AI solution easily crosses geographic boundaries, it is more difficult for developers to know this context well." Si je comprends bien, on parle du fait que les solutions D/AI dépassent les frontières géographiques, et donc, que c'est difficile pour les développeurs de connaître "tous les contextes dans lesquels leurs solutions seraient appliquée"? Si je comprends bien cela, je trouve alors que le "this context" ne capture pas cette Clarity: Definition of context of use Ambiguity in wording 8. Thank you. We introduced your suggestion ("how and where") in the definition of the premise. 41 compréhension. Après, la dernière phrase dit "néanmoins, l'outil devrait être appliqué en fonction des caractéristiques du contexte où les utilisateurs cibles se situent. Est-ce que cela veut dire donc que "Même s'il est possible que les solutions D/AI traversent les frontières et que les développeurs ne peuvent pas anticiper et connaître tous les contextes dans lesquels leur solution sera utilisée, iels devraient appliquer l'outil en ayant en tête le contexte dans lequel leurs utilisateurs cible se trouvent"? Donc, cela ne ferait pas vraiment partie de leur responsabilité de se soucier de contextes qu'elleux ne connaissent pas mais dans lesquels la solution D/AI pourrait se propager? Si l'énoncé veut dire ce que j'interprète, je me demande si le modifier comme suit pourrait clarifier: The overall responsibility of a given D/AI solution is intimately linked to how and where it is used. While a D/AI solution easily crosses geographic boundaries, it is difficult for developers to know all possible contexts well. Therefore, the tool should be applied in view of the social, cultural, economic, and political characteristics of the context where the intended users are located. Mais d'un autre côté, si c'est cela que l'énoncé veut dire, je ne suis pas certaine que je suis en accord avec lui. Dans une société mondialisée, je pense q ue c'est important de se soucier de la propagation des solutions développées au-delà du contexte cible initial. J'ai du mal à répondre donc à l'importance de cette prémisse. [I=?; C=3] 9. The application of the tool requires evidence that properly reflects the context where users are located. This does not imply that the diffusion of D/AI solutions is not of concern.
Though something like the premise above is clearly relevant and important, there is some unclarity in its expression. In the second sentence, for example, it is difficult to know the significance of the sentence clause, "it is more difficult for developers to know this context well." Is this meant only to refer to the possibility that D/AI product developers are unlikely to intimately understand the various contexts in which the products are likely to be used? If so, how might developers approach thinking about the cultural, social, and other factors that would affect how their product is interpreted? [I=4; C=2] Ambiguity in wording 10. See responses no 3 and no 5.
I would like to see two concepts incorporated into this principle. First, while the principle is not perfect, to apply a phrase, "Perfect is the enemy of the good." It would be a mistake to await perfection because to do so, runs the risk of never realizing the benefits. Second, the impacts of social, cultural, economic and political characteristics can be acknowledged in the guidance given to users of the tool. [I=4; C=4] Rationale of the tool's premises 11. Thank you. See response no 3.
May be some illustrations of how a given tool can be framed to take into account some of these contextual variations. [I=4; C=3] Add example 12. A user guide will be developed once the tool is finalized.
This is an important point, but "tool should be applied in view of" still seems rather vague --is there need of a process before applying it in different contexts, for example, stakeholder engagement etc.? [I=5; C=3]

Stakeholder engagement
13. This is addressed in the Inclusiveness attribute.
While context should be assessed in understanding whether a D/AI solution may be beneficial against the potential risks it could have on the intended users for facial recognition applications as one example, are controversial in most contexts. Just because a technology exists, does not mean it should be used, or it may have limited uses in extreme circumstances. Further, some oppressive regim es may facilitate controversial applications or lack governance/regulation around them, and as such applications should be assessed through human rights and/or patient safety lens as appropriate rather than through political or economic factors (which may be hindrances rather than facilitators for good change). Moreover, many algorithms have been found to be biased and have discriminatory outcomes, in this case it is important that they are trained and developed to ensure they do not harm those subjected to them, which does call for meaningfully understanding the socio-cultural factors where D/AI technologies may be implemented.

Participants who did not complete the Delphi survey
The overall responsibility of a given D\AI solution rest primarily on the framing and resolution of the problem set in the D/Ai. Problem solving D/AI is not neutral. When used in a specific cultural context there's a responsibility to assure the proper development of the users' capabilities. In other words, there is a double responsibility in the creation and in the uses of D\AI which must be taken care of.

Comment Issue Response
I scored 3 concerning the "clarity" because the second part of the premise is probably still a bit too broad to allow an accurate assessment as to whether the tool under consideration really meets this criterion. I guess many could argue that even "establishing an individual risk level" could be used (secondarily) to attend collective health needs/challenges. [I=5; C=3] Clarity: Individual vs. collective benefits 23. We clarified why a population health perspective better supports responsible innovation in health.
If a D/AI solution improves health for some individuals, it could naturally translate into collective benefits since these people will require fewer health resources, leaving more to others. Does that mean that the solution naturally aims for collective benefits?
The answer to this question should be clear from the description. Seems like this is linked to responsibility being in the intention vs the result. In my example, the intention could be an individual benefit, but the result is a collective benefit. If you assume that the individual and collective benefits are linked (as in the example), then can we say that the intention is a collective benefit (although the problem is approached from an individual angle)? [I=5; C=4] Clarity 24. We revised the premise for those who may not be familiar with a population health perspective.
I don't understand this -but it may be because i don't work in healthcare [I=3; C=3] Clarity 25. See responses no 23 and 24.
The explanation text is fine, but the title is confusing in saying responsibilities _means_ aiming for collective benefits, i.e., it equates the two. Perhaps replace "means" with "includes" or "requires". [I=4; C=2] Wording 26. We reworded the premise itself.
In personalised and predictive health care it becomes increasingly important that the root cause is attended to, as opposed to merely treating the symptoms. This will be the real value of D/AI used in combination with other personalised and predictive medicines. [I=4; C=4] Clarity 27. Thank you for the example. See response no 23.
The collective benefit is important. However, there is a small risk that decision-makers and/or users will end up reducing the collective benefit to an "average". This risks leaving some groups/people who will not benefit in a blind spot, even though the technology theoretically has a high average collective benefit. The question of "proportionality" should be integrated. A collective benefit that takes into account the diversity within the target population.

Comment Issue Response
In the current formulation, this premise seems a bit "limited" in scope since it simply points at the tool. Maybe you could consider reformulating along these lines, in order to make it more clear and also more relevant (in my view): "Several D/AI solutions are being developed with the explicit intent to 'do good', that is, to alleviate social problems and/or contribute to major societal ch allenges such as the United Nations Sustainable Development Goals (SDGs). Though AI for Good (AI4Good) solutions may generate positive impacts, simply declaring to adhere to the idea of AI4Good is not enough, but it is necessary for the D/AI solution to be developed transparently, so that it can undergo an assessment of responsibility, such as that offered by this tool". Link with responsibility 51. We clarified how responsibility is defined and approached in this tool.
It depends how responsible innovation is defined. Also, AI for good may also be good for certain groups but not others e.g., discriminatory algorithms despite their best intentions to promote health and wellbeing. Examples of where AI for good as a concept is defined and has not been responsible would improve the definition here. Determining whether a D/AI solution is for good or responsible may have some overlap but cannot be assumed the same. I disagree with the last statement as stated: it implies that you will deploy the D/AI solution in order to assure whether or not it is responsible. This point is perfect. In the figure, you had put "Digital capabilities and capacities are 'super determinants' of health", but here you have put "Digital literacies and Internet connectivity are "super-determinants" of health". The terms and concepts should be The concept of "super-determinant "is not easy to understand. I would suggest a more understandable term to describe the concept, such as "baseline condition" or "core element". [I=1; C=3] Clarity 62. See response no 59.
Important to note that even universal access to broadband and technical literacy will not perfectly protect against inequity. Rather, inequity might become entrenched according to who uses (or does not use) D/AI products. While universal access to these products might help reduce inequity, it could make things worse if primarily the disadvantaged are by circumstance pushed toward using digital health tools while more advantaged persons are able to access conventional health services. Generally, this seems clearly stated, although seems to be overlooking equity in design in the tools [I=4; C=4] Equity in design 64. See the Frugality attribute.
I don't like the use of the term "Super-determinants" -it's not defined here and sounds elitist This statement: "that most D/AI solutions are likely to increase health inequalities unless universal access to bandwidth is achieved and everyone is equipped and supported to become digitally literate" ignores whether or not individuals wish to access their health care in this manner. W here is the agency of the user in whether they wish to become digitally literate or equipped? For many people, telemedicine is still new, unfamiliar and undesired. This argument also misses an opportunity to address human-machine interaction in terms of how D/AI solutions might work in conjunction with employees in the health setting. [I=3; C=2] Wording Human-machine interaction 65. We agree: individuals' wish for accessing and using D/AI solutions cannot be taken for granted (see the Human agency attribute). We revised the premise accordingly. See response no 59.
This principle appears to be more an observation, rather than anything else. Yes, it is conceivable that individual health inequities may be increased and that is an issue that should be examined. However, for individual D/AI solutions, the issues should be around the benefits that the D/AI solution delivers and whether the effort to achieve those benefits is justified in the face of com peting priorities, rather than on inequities that arise because some individuals enjoy the benefits of the D/AI Solution while others may not. What do you mean by "this tool" in this statement: This tool thus recognizes that most D/AI solutions are likely to increase health inequalities unless universal access to bandwidth is achieved and everyone is equipped and supported to become digitally literate. I think there is a big jump from having access to digital resources is required to benefit from AI to AI (which I think you mean by "this tool") will increase health inequities because not everyone has these things. [I=2; C=2] Clarify the premises' rationale Disagreement 72. We clarified the purposes of the premises and of the tool.

Comment Issue Response
I would maybe add "draw conclusions...that have the potential to bear on how human decisions are taken or substitute human decisions". Also, i would like to point out that, given the very broad definition of "digital solution" one could argue that "AI solution" is a sub-category included in the broader category of "digital solution" (indeed an AI solution according to your definition would als o be a "system that ....process data", and thus it would be also a digital solution). If you think this is the case, then maybe better to make it explicit, otherwise it is redundant to ask, "Does the solution meet one of the two definitions above?", it would suffice to aske "does the solution meet the second definition above?". while it is critical to define core terms like AI, this is a fraught endeavour as there are many different competing definitions. It may be better to avoid adding yet another definition and defer to one in use by an authoritative body such as the OECD, EU or ISO/IEC JTC1 SC42 subcommittee on AI. Personally, I think the one defined in the proposed draft of the EU Ai act is good is that it is very broad and cover a range of technologies -potentially even an excel spreadsheet using statistical functions qualifies as an AI system, and if your aim is to avoid harm -why shouldn't it. Your current definition excludes AI that may be based on logical or knowledge based technology (rather than the statistical or machine leaning technologies the wording implies For the AI solution, are you speaking about ML solutions or Deep Learning? Maybe not necessary to specify here, but the explanation given could vary. If just ML, for example, I would change the last part of the AI solution statement to : "beyond initial programming", because in ML there is initial programming (supervised) that then the system learns from there. For an AI solution using DL maybe more accurate in lay terms would be "without ongoing programming" (unsupervised

Comment Issue Response
It is not clear the meaning of "such purpose" in the sentence "The decision to turn a non-digital means into a new D/AI solution should thus substantially improve current means of fulfilling such purpose" Also, it is not clear why the last sentence of this crit erion repeats the definition of digital solution. [A=3] Wording 86. We reformulated this criterion into a premise.
87. We deleted the example and the misplaced sentence. Je pense que le texte manque de précision ou comporte des partis-pris implicites. Par exemple dans la phrase "The decision to turn a non-digital means into a new D/AI solution should thus substantially improve current means of fulfilling such purpose and the relevance of a D/AI solution should be clearly explained. ", le "such purposes" n'a pas été défini préalablement. Or, c'est crucial pour l'application de ce critère que les lecteur.ices soient d'accord sur ce que sont les "such purposes". Ensuite, dans la question pour inclure ou non la solution, la phrase se limite à "Is the relevance of the D/AI solution explained in compelling terms?", donc on ne parle plus de "such purposes". Est-ce que le parti-pris de ce critère est que les purposes de la solution D/AI soient de "Decrease the burden of care" et que la "relevance" de la solution doit être expliquée en démontrant de quelle manière elle contribue à ce "purpose là"? Je pense donc qu'il faudrait être plus explicite sur ce qui est impliqué par "relevance" (cela découle probablement des prémisses de l'outil

Comment Issue Response
Although it entails risks certain risks to wait until the GA stage (e.g., the risk that de-implementing a solution/technology after it has come into broad use might be more difficult), i agree that waiting until GA offers the advantage that you evaluate the ac tual broad operationalisation of the technology. [A=4] Agreement 99. No specific action required.
Here, there could be an ambiguity if those who use the tool are trying an AI in a "restricted" pilot project context? Should they also reject the technology because it is not at the stage required for routine clinical use? [A=5] Clarify the purpose of the tool and screening criteria.
100. We clarified why the criterion should be applied when the tool is used in a formal evaluation.
I believe this depends on who is using the tool. As a researcher involved in projects where we develop new D/AI solutions that might never reach GA stage, we would find it useful to use the tool to orient our work. Could the recommendation rather be in line with "keeping in mind that degree of responsibility can be established more meaningfully at the General Availability (GA) stage, if the solution has not reached GA stage yet, it should be reassessed at a later time" ? So, I get this would not be possible to use as an inclusion criterion. Je me demande si le fait de le mettre comme un critère d'inclusion/exclusion ferait que l'outil soit mis de côté par des personnes qui auraient pu en bénéficier parce qu'ils ne sont pas encore rendus au stade GA. [A=3] Clarify the criterion's purpose 101. We clarified how the tool can be applied as a design and/or procurement brief at an earlier or later stage. There may be benefits to performing preliminary assessments, before a D/AI Solution reaches the GA stage to identify macro/big picture issues. There should be some criteria that solutions must meet to move to commercialization and these will need to be assessed. Blueprints for security and compliance tests should be enough to assess. Actual implementation is difficult to assess anyway.
[A=2] The issue is unclear 108. We cannot address the comment.

Isn't it preferable to assess responsibility and reorient it if needed as early as possible? [A=1]
Clarify the criterion's purpose 109. See responses no 100 and 101.

Comment Issue Response
Very relevant, but you might rethink how to more properly define the last element ("The AI relies on biased datasets"), since many could argue that the vast majority of AI solution are actually (at least currently) developed on partly biased datasets. You could require, for example, as an exclusion criteria for this case that "The AI does not have a strategy/reflexivity element, on how to minimise potential biases in the datasets used for its creation". (or along these lines). [A=4] Wording: biased datasets 110. Thank you. We revised the set of practices described in this criterion.
Difficult to say right at the start that data protection measures have not been established. How would you use this criteria to rule out some innovation right at the start? [A=1] Applicability: data protection 111. The information sources needed to apply the criteria are indicated.
Data reselling may not be a sound reason to exclusion -it could be done to help e.g., with new drug research, but if it is done with clear informed consent that may not be a problem. It not reselling that a problem, but it being done without full disclosure. deliberate deception implies knowledge of intent, which can be difficult to determine, whereas if it is just the result of poor design practice then this may be a reason for a poor score and corrective measure, but exclusion may be disproportionate. _relies_ on biased datasets implies we are taking about data somehow central to the design, whereas less central use of biased data, e.g., for testing, could also be interpreted as not reliance, but nevertheless introduce harms. Also, the implied approach places the burden of assessing bias on the assessor, whereas a better approach is to exclude if the approach and method of bias tests are not provided. [A=4] Relevance: data reselling It's hard to define what is "minimally responsible" I'm uncomfortable with "Data reselling is the primary business model". For me, from the moment there is resale of data (whether it is the main or secondary activity), it raises a problem. The exclusion criteria, even the screening part in general, are already evaluation for me. I understand the idea of the screening, but it's kind of an "initial assessment on the 'musts' before moving forward. Secondly, the social irresponsibility and the minimum responsibility part required, overlaps a bit with the "Organizational value attributes" dimension, because it covers, at least partially, the business model (e.g., resale of data Je pense que le dernier point est difficilement applicable: The AI relies on biased datasets: The dataset used to train an AI solution may be biased, may produce results that cannot be generalized to the entire population of intended users, may lead to unfair decisions against particular individuals or groups or may entice discriminatory behaviors. When the appropriateness of the dataset used to train the algorithmic system has not been properly validated, the solution should be excluded from the assessment. -car ce qui constitue ou non un "biased" data-set peut être très variable d'un contexte disciplinaire à un autre. Je pense que c'est absolument essentiel d'encourager des datasets non-biaisés, mais je pense que vraiment bien saisir ce qui est ou non un data set biaisé dans différents contexte est difficile. Je ne sais pas si, comme pour le critère où on parle de "relevance" -il faudrait qu'il y ait une formulation plus du style "absence of compelling argument that the dataset has been developed as to minimize all potential biases". [A=3] Some of these practices are very difficult to prove, so it might be difficult to make it an explicit exclusion criteriion. [A=1] Applicability: unspecified 117. See responses no 110 and 111. This is perhaps unavoidable, but the final two criteria above strike me as impossible to measure consistently without a more precise rubric. Whether a product lacks cybersecurity or data protection measures, for example, is often not very clear. This isn't necessarily an objective measure, for what counts as a "proper measure" might vary greatly according to context and actual use of a product. It might also be worth considering that data protection measures (as well as the reliance of a particular AI on biased data) can be quite difficult to assess in the abstract. These criteria would benefit from a more thorough description of how they can be measured. It is not clear from this wording whether the problem is that reselling is the 'primary business model' or that this is not clear to the user. [A=3] Clarity: data reselling 120. See response no 112.

Comment Issue Response
You may need to rethink the element "To fully understand a given algorithmic decision or advice", since a lot of AI solutions might not meet this, and there is disagreement (at least in ethics) on how "explainability/understandability" is relevant (or even on the definition of these terms). The element "To know how to contest such decision or advice" seems to me the most important. You spoke of digital literacy in the premises. i think that in the future we might need something like "AI literacy" (i.e., learn how to deal with advises, suggestions, recommendations etc. given by AI-solution". Also, you might want to specify whether the elements have to be present alternatively or cumulatively in the assessed AI solution. The measures mentioned are all ex-post, i.e., once the D/AI application is already deployed. This missed the need to engage with affected stakeholders in the proposal and design of the application, e.g., through participatory design or by giving patient groups whose data is used in training and testing the application a stake in decisions about its deployment and use. On the latter we must remember that these applications are completely reliant on patient data, but once consent is given individuals lose any agency to guide how that data, even once transformed into an AI model, is used. Also, such agency offered to individuals is less powerf ul than what might be possible if exercised collectively, e.g., via patient groups acting as data unions or data co-ops. "To be heard and have one's rights protected when discrepancies arise." -it is not entirely clear to me between what the discrepancies are -discrepancies between the user's perception and the system's recommendation? Also, not entirely clear to me what rights discrepancies would infringe on, and whom would be the key stakeholder in charge of listening. This seems very important to me, but I am wondering how it would be addressed practically. I wonder if this is a point specific to each solution, or if it is a more general point that should be part of policies regulating D/AI at a societal level. This is in my mind an exceptionally important consideration, though I worry that the capacity of a human user to engage in th eir own preferred course of action will often run up against the tendency of humans to reflexively trust automated processes. That human decision-makers might sometimes fail to scrutinize machines with the degree of care that the situation might demand might warrant consideration here. It might also be worth expanding on the criteria of fully understanding "a given algorithmic decision or advice," particularly through referencing rights to explanation that have been adopted in the European Union, Quebec, and elsewhere. [I=5; C=3] 54 "fully understand" seems a steep standard in light of the literature on informed consent and perhaps what should be aimed for more in that is the provision of clear info at a certain grade-level also might acknowledge that there can be some trade-offs involved with agency and obtaining info of scientific value etc. All efforts must be made to ensure data subjects etc. are informed and understand the risks and benefits for their acceptability / adoption of technologies. People should be able to change their minds where appropriate and able to if there is individual-facing technologies, though may be more difficult with technologies that are collectively used or clinician-facing. However, mechanisms should be put in place for regular monitoring of preferences of individuals for example through dynamic consent or data access committees/ representative panels. [I=5; C=4] Dynamic consent 135. See the ELSIs and the Human agency attributes.
This is well stated but the procedures necessary to achieve these objectives will be difficult to implement. For example, enabling an individual to fully understand an algorithmic decision may require disclosure of proprietary information about the algorit hm which developers may be reluctant to disclose. [I=4; C=4] Applicability 136. The scale was developed and clarifies how the attribute can be applied.
Est-il utile de distinguer "fully understand a given algorithmic decision or advice" plutôt que "understand a given algorithmic decision or advice". "by having full control over the way data about them is being collected and used" me semble un peu réducteur (d'ailleurs, l'affirmation suivante couvre beaucoup plus large). À reformuler pour ajouter " a better understanding of how a D/Ai solution work and having sufficient control..." (full control me semble par ailleurs utopique -sufficient est plus réaliste

High degree of responsibility for the Human Agency attribute
As I mention before, the fact that the AI solution comes with a way (whether it's a component or a procedure) to teach how to deal with it (i called it "promoting AI literacy" in the previous comment) [I=4; C=3] Clear procedures and training of staff for responding to concerns raised by users in a timely and engaging fashion. Notification and access points to enable users to raise concerns. Testing of explanations with contextually appropriate test users to ascertains if the explanations are understandable and useful for the target audience. [I=5; C=4] This attribute seems in line with decision support systems rather than systems aimed at replacing human experts. From what I understand, a high degree of responsibility for this attribution would be achieved by a D/AI solution that makes a prediction (or recommends an action) using a model that is either transparent (e.g., decision tree) or that can be queried further for understanding its prediction, and that the prediction would be used to guide a human expert in their decision. The human expert could decide to act or not according to the model predictions and that could be logged to further improve the model. A high degree of responsibility would involve giving the data subjects of health data a real ongoing and non-repudiable say over how their data is used, including a stake in the control of such data. Supporting mechanisms for exercising this control collectively, with strong democratic controls, would also help balance the power between patients overall and those wielding D/AI technology. [I=4; C=3] 55 Ability for human to access more information if he or she so wishes to consider over a period of time with which the patient is comfortable to make a free decision in line with his or her values and beliefs.
[I=5; C=5] Technology makes it possible to leave the final choice (the freedom to do and act) to the human. That it does not in any way challenge the possibility of human action (e.g., patient, professional) [I=5; C=4] I have very limited time to fulfill the study -answering this question would require more reflection time that I have. I am sorry to participate in this limited way but I hope the answers I provide, even if incomplete, will still be of some use to your team. [I=5; C=4] If using this solution improves user agency, for example a watch that monitor and let you know your own Physical Activities, that improves your agency. The same watch that centralize the data and force adoptons other proprietary solutions, that decrease your agency. Highly responsible AI would generally be reviewable, though it is important to note that certain systematically inscrutable products, such as unexplainable deep learning might not be susceptible to careful review. In these instances, it would not be responsible for developers to give users the impression that the AI product can be reviewed or explained if this is not so. It might in some instances be MORE responsible to signal a product's unexplainable character than to falsely or misleadingly suggest that review is possible. Use of unexplainable products may be responsible insofar as the product in question satisfies other criteria listed in this section, such as knowing when a product is in use. Attentive balancing of these factors would be important. I think an additional characteristic might have to be added, which could somehow ensure that the worry you rightly express (i.e., that the D/AI solution operates "without creating additional cognitive and administrative burden") does not materialise. E.g., a D/AI solution which allows a decision to be taken more quickly, but which requires much more time for inputting the data etc. This characteristic could be something like "the D/AI solution does not require additional chores to the patient and/or the healthcare (or social) worker, which would negatively disrupt the provision of care.

56
This attribute seems to conflate interoperability, which yields benefits in terms of lifetime costs of systems (including procurement, testing, and QA costs) and the quality of user friendliness, which also confuses system user and data subjects (often not the same, e.g., patients vs clinical professionals). [I=4; C=1] Scope of attribute 142. We aim to work with a limited number of attributes. We revised the definition to increase clarity.
In my opinion, this attribute could be subdivided into two. There is human-centred interoperability. But there is also the question of ergonomics and usability, which refers among other things to the "human-machine" interface. You can put them together in one attribute, but it could be two attributes as well. One of the criteria is that "non-proprietary software solutions are used". That seems to me to incorrectly state the issue. I view the question as whether there are mechanisms to allow an individual to understand the decision that was reached and how it was reached. There are alternative ways of achieving this result than allowing only non-proprietary software solutions. The criterion should be whether these alternative ways deliver the desired result, rather than whether a particular mechanism is used.

High degree of responsibility for the Human-centered Interoperability attribute
That it is evaluated by the patient/care-receiver and the (health)care provider as actually making the provision of care more seamless and to increase the (quality of) time dedicated to the care-receiver.
[I=4; C=4] 57 A D/AI solution with a high degree of responsibility for this attribute would be available on all platforms (e.g., Windows , OSX, and Linux) and users would be able to export their data in a general, nonproprietary, format (e.g., CSV file). [I=5; C=5] Very important criteria. This is one aspect that is not that explicit in the definition and explanation of the criteria: the digital or AI solutions need to be interoperable with existing information systems within an organizational context. This not only about the human component that is important but also the technical one. [I=5; C=5] A high degree of responsibility would be characterised by having a clear, continuous register of the overall health system into which the D/AI application is integrating and the stakeholder goals which are addressed by those system goals so that gaps in stakeholder agency and participation can be identified, filled and monitored over time as more systems are added and integrated. [I=4; C=1] A technology with fewer connection and adjustment steps with the user's technologies and infrastructure. It should also be as intuitive as possible. In other words, the technology should be developed with a "universal precaution" "approach to health literacy, in which organizations design communications strategies with the assumption that any patient may need literacy support, rather than seeking to identify subsets of low-literacy patients for special attention. Universal-precautions measures include writing actionable content,114 using plain language, using visuals such as pictographs, and minimizing text-based input. You introduce reference to safety and quality standards, which is positive, but these go beyond the scope of data governance and would be applicable to other attributes also. Also, there are data governance standards you could also be referencing.
It is not clear to me what "explicit compliance to the laws and regulatory frameworks where users are located" might tangibly mean in this context. One of the significant characteristics of digital health and AI is that the rules surrounding their use and design are often unclear or unsettled. There is significant uncertainty in Quebec and Canada, for example, about what precisely the law expects of D/AI developers, as well as D/AI users. As a criterion for measuring responsibility, therefore, legal and regulatory compliance may assume that the normative landscape at present is more developed than it is in reality.

High degree of responsibility for the Data Governance attribute
The presence of certification systems on data handling, and also a certain investment (e.g., in terms of training programs) in 'human-ware' (i.e., on the skills in terms of responsible data governance of those that actually handle data). Finally, also the element of performing a data protection impact assessment (on the line of what is required -at least in Europe -by the GDPR) could be considered. Also, in terms of data infrastructure, an important element could be transparency on the origin of the infrastructural elements on which the D/AI solution is based ( perhaps you should say something about how this assessment sits within regulatory frameworks, e.g., GDPR and the impending AI Act in the EU, and how it will be updated to accommodate change to regulation in different jurisdictions. It's not clear at this point if the framework can even be taken and adapted or extended by people in different jurisdictions.
It would seem helpful at this stage to define responsibility as a background concept.
Equity in design I would split premise 2 so you have individual responsibilities and collective responsibilities. Secondly, split premise 4 into digital literacies and access to the internet of things -both of these are related, however should be separate issues for assessment, as one can have access yet not the skills to use technologies and vice versa. Perhaps additional premises could relate to data / digital infrastructure enabling D/AI innovation and dissemination another about governance/regulation D/AI innovations would be subject to whether the D/AI solution is necessary / provides better solution to existing methods and lastly benefits for users / patients / health systems should outweigh benefits to commercial entities or that these should be proportionate to avoid exploitation of data subjects and users.
The second paragraph in the attribute definition seemed opaque and unnecessary and made it difficult to follow the overall logic of the section. It is not clear why a statement needs to be made about what is not known, as this is an assessment tool, not a research tool. Suggest deleting it Referring to: "Though D/AI solutions may improve population health by facilitating a range of human decisions and actions, little is known about the way D/AI solutions affect in practice user behaviour, cognition and judgement (e.g., overreliance, avoidance, overconfidence, hypervigilance) and thus their impact on care seeking behaviours and on health and social care provision." Scale Level C could be more clearly worded e.g., "Procedures that EITHER enable users either to understand its outputs or decide and act in accordance with their own goals. There are no formal means for users to have their concerns acted upon" [C=2; A=4] Clarity Wording 186. We agree with these suggestions and slightly revised both the definition and the scale.
I fell this attribute is very clear, my concern is minor, I don't quite understand why peer pressure is part of the attribute: "decide their own preferred course of action without undue pressure from the D/AI solution itself and from peers" [C=4; A=5] Peer pressure 187. Peer pressure may limit human agency in professionalized settings like clinics and hospitals, in schools or in workplaces.
Le seul défi que je vois ici est celui de la capacité des gens à prendre la décision (selon leurs objectifs: ça rappelle un peu la théorie de la justice et les "capabilités"). Les personnes et groupes ne sont pas égaux devant l'information "disponible" pour prendre une décision éclairée. Ça renvoi à la théorie de la rationalité limitée de Simon (entre autres): c'est l'idée selon laquelle la capacité de décision d'un individu est altérée par un ensemble de contraintes comme le manque d'information, des biais cognitifs ou encore le manque de temps. Dans cette optique, la personne a tendance à choisir des solutions satisfaisantes plutôt qu'optimales. Et là, ça renvoi encore à la notion des capabilités ("what a person is able to do or be"): ceux qui n'ont rien se contentent de rien, ceux qui ont peu se contentent de peu. We chose not to reorder these paragraphs because the logic across the tool is to first define the attribute and then provide its rationale. We, however, revised the scale to increase the correspondence between the definition and the levels of the scale.
"Testing the D/AI solution in the context of use before its full deployment and regularly assessing how it interfaces with users' evolving digital" +> For me it seems that "before its full development" is assessing a phase that might be at the before G/A stage -and thus the D/AI solution would be excluded from formal assessment. Also, the "testing before its full deployment" is not mentioned in any of the scale levels A-D, so maybe it should not be present here (although I do believe it is an essential step -but it does not seem to help with the evaluation in the frame of this tool) When reading "B-Is periodically adjusted to fit users' digital infrastructures and aligns with their data management practices, but provides limited data sharing functionalities" I have a feeling that it blurs the line between the two first points of attribute definition. Because the first point says: " Scale misses elements (testing, adjustments) Inadequate data infrastructures or practices in health settings 193. We reformulated the attribute and the scale, and retrieved "testing the D/AI solution…" aligns with user capabilities, needs, work processes and task allocation to minimize cognitive and administrative burden;" Here, ALIGN seems to refer to individuals' capabilities, needs, etc., .... While in the rating scale, ALIGN seem to refer to a system. It might well be that the system of data management practice in place is not optimal. Also, I believe part one of the first point " Designing a solution that is operable on widely available systems and devices", and last part of point 2 "and using non-proprietary software or solutions that facilitate data exportation" are quite similar. When reading the attribute, I can make a distinction between point one and two , but when I read B on the scale, suddenly I'm not so sure about the distinction. -So, I believe either something should be clarified in the attribute description -or in the way the scale point B is phrased. Maybe it is because "data exportation" is used in attribute description, but "data sharing" is used in the scale? Also, I wonder why nothing is mentioned about adjustment in C and D. I know you don't want to make a scale that looks for "defects" but rather looks for "what's there". But here, it seems to me that the main difference between B and C is in fact the absence of regular adjustment and clearly stating it would make the scale more transparent for users. Doing so would perhaps also allow you to have D on the scale stated in a similar way as you state it for other attributes "No particular human oversight procedures" e.g.: C-Either requires substantial adaptations to users' data management practices or provides limited data sharing functionalities and is not periodically adjusted to fit user's digital infrastructures 195. See response 20.
The only question I have is "who would answer -Requires substantial adaptations to users' data management practices and provides limited data sharing functionalities ". If I am part of an organization, it feels like this would require a tremendous amount of insight, honesty, or transparency to state it during a self-assessment. Therefore, I would maybe rephrase it slightly to make it easier for this category of assessors to choose this option. If it is an external assessment, it completely makes sense. [I=5; C=4; A=4] Applicability 196. We revised the level D of the scale.

Software frugality attribute (I= threshold reached; C = threshold reached; A = threshold reached)
Attributes appears simply as software development good practices here; I do not clearly see the link with frugality.... maybe this? https://digitalprinciples.org/ [I=2; C=2; A=2] Frugality in software 197. Thank you for the website reference, which is a great resource for D/AI solution developers. We relied on the frugal innovation literature to identify these three key characteristics.
One thing you may add in the definition of the attribute (i am unsure whether it would better fit under "affordability" or "optimized performance", probably the latter) is that the D/AI solution should not generate some kind of 'externalised' costs (for lack of a better word). Meaning that, for example, having a clinical D/AI solution which is affordable in itself (e.g., a clinical software that does not cost much), but requires a lot of (costly) hours of training for clinical personnel that wants to use it. I think you already point at this issue when you mention the example of "data plans being unaffordable", but you could be more explicit. Again, the key issue would be not having a D/AI solution which is cheap by 'externalising' costs thereto related (e.g., to the training needed for personnel who wish to use it, or e.g., to expensive data plans Theoretical precisions 200. Fugal innovation designers certainly look for such opportunities because they increase the number of potential users while maintaining a focus of core functionalities. As you state in the introduction, it is very possible that one criteria in a specific context makes it completely acceptable "For example, in certain situations, it may be legitimate to have a lower score on the Health relevance attribute to reach a higher score on the Eco-responsibility attribute (or vice versa)" I am openly wondering if the option "One characteristic of software frugality" is therefore that revealing of the total lack of frugality -if let's say in a specific context Affordability was the key, most difficult thing to accomplish and could in the said context demonstrates a major effort to be "frugal". Sorry if it does not make sense. [I=no response; C=5; A=5] Theoretical precisions 201. Thank you for the question. A 'C' on the scale does not imply a lack of frugality (a D' does). "Affordability" of a given D/AI solution may indeed be appraised in view of the currently available solutions (or cost of not intervening).

Data governance attribute (A = threshold reached)
This criterion might not be applicable uniformly across all solutions. What about a solution that trained on publicly available data? No data governance involved here, that would be NA. [A=2] Applicability to publicly available datasets 202. This is an intriguing point! It suggests that datasets 'downloaded' from a public source would not require data governance. We rather believe that good data governance practices are an important responsibility feature, no matter the origin of the datasets.
À mon avis, il faudrait préciser qu'est-ce que vous entendez par "a third party". Je comprends à quoi vous faites références, mais il serait pertinent de rajouter le qualificatif "indépendant, externe, certifié, reconnu" . Il serait probablement pertinent que la gouvernance des données, puisse être vérifier par des "Crash testing" (mais il y'a d'autres types : Smoke testing / Comprehensive Testing"), qui permettent de vérifier, évaluer la solidité du logiciel, aussi bien le code que les éléments en lien avec la question de la gouvernance des données. "Crash testing is the innermost technique operates on each code check-in of the GUI software and it is executed frequently with an automated GUI testing intervention and performs quickly also. It reports the software crashes back to the developer who checked in the code. I understand that it is difficult making an "exhaustive list" in the attribute, and thus you write "may include" and then list 'only' three (acknowledging that there may be more). However, a risk in doing this, is that it might be difficult to evaluate a tool which may have some "practices of programming and software eco-responsibility" which -however -differ from the three you listed. Thus, why not having (in the scale section) something like The D/AI solution relies on: A. "Three practices of programming and software eco-responsibility or more. If such practices differ from the ones explicitly listed in the attribute, there must be a clear explanation as to why they count as "practices of programming and software eco-responsibility" B. Two practices of programming and software eco-responsibility. If such practices differ from the ones explicitly listed in the attribute, there must be a clear explanation as to why they count as "practices of programming and software eco-responsibility" C. One practice of programming and software eco-responsibility. If such practice differs from the ones explicitly listed in the attribute, there must be a clear explanation as to why it counts as "practice of programming and software eco-responsibility" D. None of the programming and software eco-responsibility practices [I=4; A=3]

Wording
Other potential ecoresponsible practices 207. We have revised the definition and the scale accordingly.
I feel it is strange that A is stating: "three practices or more" while only 3 types of practices are described. I believe this makes it a bit confusing because it leaves the door open to interpretation. Like, if I evaluate a D/AI solution that has either programming, techniques that substantially reduce the quantity of energy and time required to develop a D/AI solution (e.g., tinyML) but not modeling or computational techniques -I would rate that as "one practice"; and if this solution also uses highly energy efficient CPUs, that would also be "one practice" -the solution would be given a B for 2 practices. But if the D/AI solution has e. g. programming AND modeling AND computational techniques that ... -but nothing else, should I rate it as "one practice" (C) or would I count that as 3 practices and rate an A? [I=5; A=4] Definition and counting of 'practices' 208. See response 34.
I absolutely love the attribute, love the propositions. But re-reading them a few times, I think they could be complemented by quite a few other practices. I recently worked on calculation methods of CO2 emissions of machine learning algorithms. And we tried to apply it to one of our small-scale AI projects. It faired poorly. That made me think that the options that you list might (in my view) make more sense for large-scale projects, where (in the best case scenario) all the information about what third-party services offer/use/do (e.g., AWS). But for smaller projects or tools that would use your tool, you might always get the worst answer because it is hard to get any of the options you highlighted (which is also a possibility that reflects poorly on the lack of consideration of eco-responsibility in AI for instance). [I=5; A=4] Agreement Other potential ecoresponsible practices 209. Thank you for these insights. We agree that the practices we listed may not exhaust all possibilities but adopting a large-scale perspective on the organization's eco-responsibility practices is better aligned with the intent and scope of RIH. I feel very divided about this premise. It seems to assume that D/AI solutions emerge from engineers that are (apart from "making AI") "discipline neutral". I am thinking for example about Parkinson's Disease (PD), where there is now a growing consensus that air pollutants might be part of etiology of the disease. Our team of speech language pathologists (SLPs), persons with PD and engineers are looking at a D/AI solution to make SLP services more accessible. We can recognize that a larger solution (not necessarily D/AI) is requested to address the systemic problematic leading to the development of the disease, but these solutions are much more long term and our own expertise are not adequate to address this larger problematic. I feel that there is a limit to the extent of population health perspective a specific D/AI tool is capable of including without exhausting the team. But I recognize and strongly believe that the population health perspective is crucial to embed in all our "disciplinary" health perspectives (and that is not specific to D/AI), so I wonder how this premise could be more appreciative of this fact without minimizing the need for D/AI solutions to address or be mindful of the larger population health perspective. Maybe it is something about the phrasing of the last section " Those who apply the tool should thus recognize that although a D/AI solution that provides individual health benefits is valuable, a responsible D/AI solution should aim for broader collective benefits." Who are "those who apply the tool"? It is the users and not the developers? I feel both the developers, and those who deploy and use the tools should recognize that. I feel we can have it as a premise that those who develop D/AI solutions should recognize this fact -but I don't feel we can push it on users -but rather that the developers should be transparent with this. What is "recognize"? -maybe give an example of how this could be recognized -is it through a value statement? Is it something else? I feel this premise very important, and the mere fact that is divides me makes me aware that there is work here to be done.

D/AI solutions should tangibly improve current processes and means (I= threshold reached C= threshold reached)
One main issue is that there is a growing global shortfall in healthcare workforce numbers, so D/AI solutions may be the only course of action in some instances e.g., in low-risk applications such as back-office functions, though in high risk clinical applications ensuring the technologies work without negatively affecting patient safety and clinical burden are paramount. However, there does need to be due diligence in understanding the context in which they are to be implemented in and support provided for successful adoption. Rationale of the premises 221. The premises are not used to assess responsibility but to guide evaluators' reflections on the broader issues raised by the D/AI solution under assessment. Only the rating scales of the attributes assess the degree of responsibility and are used to calculate the score.

D/AI solutions modulate determinants of health (I= threshold reached C= threshold not reached)
Not sure if this aspect should be covered as determinants of health is a complex topic. [I=3; C=3] Rationale of the premises 222. We agree about their complexity, but evaluators should be aware that determinants of health affect the responsibility of D/AI solutions.
Those who apply the tools should actively look for ways to support the underserved. Those who propel the technologies must plan for and identify potential inequities and offer ways to minimise and mitigate them. This may not be a straightforward process and could be done over time with bottom up approaches.

End of survey comments
Thank you for this huge work! I look forward to using this tool.

C. Phase 3-Supplementary material C. 1. Selection and documentation process of the D/AI solutions
The objective of Phase 3 was to assess the reliability of the Tool and bring measurement revisions if needed. Following Gwet's recommendations, an error margin of ±0.20 was used to determine our sample size, that is, 25 D/AI solutions [36]. To create a balanced and diversified sample of real-world solutions, we followed three steps.
Step 1: Preliminary list of D/AI solutions We reviewed 5 relevant sources of information describing several types of D/AI solutions to define and test eligibility criteria: • We searched the Internet and included 45 D/AI solutions using the following 4 eligibility criteria. The solution: 1) meets the definition of a digital solution operating with or without AI; 2) addresses at least one determinant of health; 3) is already available for use (in the Americas, Africa, Asia, Europe, or Oceania); and 4) its developer makes freely available the information required to document the criteria and attributes of the Tool in English, French, or Portuguese on its website.
Step 2: Final sample of D/AI solutions Based on the preliminary list of D/AI solutions, our team members, co-researchers, and collaborators validated the selection of 25 solutions that constituted a balanced and diversified sample by including solutions operating with or without AI, pursuing different purposes, developed by diverse organizations, and used in different regions. Table 4 shows how the final sample met our diversification criteria.

Type of developer
For-profit organizations (include private hospitals and clinics) 14 Not-for-profit organizations (include universities and NGOs) 8 Public agency or informal (user-led) associations 3 Table 4. Overview of the sample of D/AI solutions used to assess interrater agreement Step 3. Documenting the 25 D/AI solutions for assessing interrater agreement For the two raters (RRO and LR) to apply the Tool as intended, we searched the website of each solution in our final sample to collect information addressing the Tool's criteria and attributes (terms of refence, privacy or sustainability policy, user guides, governance structure, annual reports). We tabulated relevant excerpts for all 25 D/AI solutions in an Excel 'scorecard' that both raters completed independently. Because start-ups tended to share less detailed documentation than large firms, PL adapted the content found on other developers' websites for the scorecard to contain all the information needed to score each criterion and attribute for all 25 solutions.
We provide below anonymized examples of information found or adapted for each screening criterion and assessment attribute.

Nondisclosure of D/AI risks (applicable to both digital and AI-based solutions)
Developers do not disclose the non-resale of data. Personal data relating to users are kept for a period of 6 months, then deleted. The statistics resulting from the use of this information can be kept for up to 25 months, then deleted, or anonymized. The user has the right to access, rectify, delete or export his or her data, or to oppose the processing of his or her data by [name of company] (or to request its limitation).

Nondisclosure of D/AI risks (applicable only to AI-based solutions)
[Name of company] is the most extensively validated AI technology in the world. It has been validated in a pivotal, prospective, multicenter clinical trial against the rigorous clinical reference standard using the Early Treatment Diabetic Retinopathy Study (ETDRS) grading scale by experts at the University of Wisconsin Reading Center. It has been tested in a clinical validation study on over 100,000 patient visits, one of the largest data sets used to test any available diabetic retinopathy screening technology, in demanding, real-world clinical environments using images captured in everyday practice. It has also been independently validated by UK NHS in a study with over 30,000 patients.

GA stage not reached
[Name of product] is available for purchase on the developers' website. It has obtained US FDA clearance.

Human agency
We work with a network of vetted linguists to translate messages and evaluate system accuracy to ensure our translations are of the highest quality. Overall, 99% of the messages sent through [name of product] offer human translation support. In-product features like on-demand "help me understand" human-reviewed translation and videos with translated captions are just a few examples of innovative tools we've built into [name of product] to support understandable, accessible communication. Teachers and families find in-app tips and guidance to support positive relationship-building. Data for personalization: we measure feedback and actions of families and educators to refine what in-app coaching content is helpful and useful to build capacity, and for whom, allowing us to eventually personalize the experience.

Care-centric interoperability
The [name of product] is used widely as a job aid by skilled birth attendants working in the periphery of the health system in low-and middle-income countries. The app works offline once downloaded, so healthcare workers in even the most remote settings can always refer to it. The app consists of 12 content modules addressing key interventions of childbirth emergencies and preventative procedures (infection prevention, prolonged labour, removal of placenta, etc.), all aligned to international clinical guidelines. The app can be used during various types of in-service trainings as a teaching aid for the training instructor and as a study aid for the training participants to support their on-going professional development.

Data governance
Machine learning model governance and transparency are fundamental to [name of product] predictive power. Our company strongly believes that data driven solutions are only as good as the data they rely on, that transparency and accountability are the only path to trustworthy AI, and that vast amounts of data means endless possibilities for data disorganization. Because our company's predictive power relies on optimizing the data lifecycle (e.g., facilitating rapid and consistent labeling, processing, and querying for operational machine learning), our Head of ML Science is accountable to the Board of directors for ensuring that [name of product] data governance practices remain aligned with a high-quality data-driven mission. This includes regularly reviewing how employees as well as any (authorized or non-authorized) third parties access our databases, report on, and correct any uses that deviate from our policy.

Programming and software eco-responsibility
Our Sustainability team develops and implements our environmental strategy. The team is responsible for setting targets to reduce energy use by our programming units and reporting annually on our progress. Our Chief Financial Officer is responsible for facilitating the ongoing assessment and audit of our ISO 14001:2015 Environmental Management System (EMS). In 2019, we completed the required external audits to certify our EMS to the updated ISO 14001:2015 standard.

C. 4. Revision of the 'Human agency' attribute
As we first reached a "moderate agreement" for the 'Human agency' attribute (Table 7), we decided to revise the definition and perform a second interrater agreement assessment. This was aligned with our Phase 3 objective of bringing measurement revision if needed.  Informed by the diverging interpretations of the two raters, the elements in red were deleted and those in blue were introduced in the revised and definitive version, while the scale remain unchanged.

Definition Definition Scale
Refers to the capacity of individuals and groups to actively and independently decide and act in accordance with their own goals when using a D/AI solution.
Though D/AI solutions may improve population health by facilitating a range of human decisions and actions, they may affect user behaviour, cognition, and judgement (e.g., overreliance, overconfidence, hypervigilance) and thus have unexpected impacts on health and social care seeking and provision behaviours. Responsible D/AI solutions can support human agency towards proper health and social care by enabling individuals and groups: • To understand the measures, recommendations, or decisions of a D/AI solution (e.g., data visualization, plain language recommendation, transparency if an AI-based solution is unexplainable); • To discuss with health and social care managers and/or clinical staff the measures, recommendations, or decisions of a D/AI solution (e.g., dedicated point of service); • To act in accordance with their own goals without undue pressure from the D/AI solution itself and from peers (e.g., freedom to use one's judgement or override an AIbased decision, clinical guidelines); • To have their concerns acted upon through a formal committee for audit, review, appeal, and redress mechanism (e.g., ombudsman, committee).
Refers to the capacity of individuals and groups to actively and independently decide and act in accordance with their own goals when using a D/AI solution.
Though D/AI solutions may improve population health by facilitating a range of human decisions and actions, they may affect user behaviour, cognition, and judgement (e.g., overreliance, overconfidence, hypervigilance) and thus have unexpected impacts on health and social care seeking and provision behaviours. Responsible D/AI solutions can support human agency towards proper health and social care by actively enabling individuals and groups: • To understand the measures, recommendations, decisions, or outputs of a D/AI solution (e.g., data visualization, plain language recommendation, transparency if an AI-based solution is unexplainable); • To discuss with managers and/or dedicated staff the measures, recommendations, decisions, or outputs of a D/AI solution (e.g., point of service); • To act in accordance with their own goals without undue pressure from the D/AI solution itself and from peers (e.g., freedom to use one's judgement or override an AIbased decision, clinical guidelines); • To have their concerns acted upon through an appeal, audit, review, or redress mechanism (e.g., ombudsman, committee).
The D/AI solution is accompanied by: A. Three of the described enablers or more B. Two of the described enablers C. One of the described enablers D. None of the described enablers Table 8. Changes brought to the 'Human agency' attribute for the final version of the Tool